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PREFACE  TO  SECOND  EDITION 


The  changes  which  have  been  made  in  this  edition  have  as  their 
purpose  the  better  adaptation  of  the  book  for  class-room  teaching 
of  the  elements  of  statistical  methods  useful  in  the  biostatistical  and 
medical  fields.  These  changes  have  taken  the  form  of  additions, 
omissions,  and  rearrangements  of  the  material.  To  a considerable 
extent  the  book  has  been  rewritten.  On  account  of  the  widespread 
interest  in  the  matter  at  the  present  time  a chapter  has  been  added 
dealing  with  the  logistic  curve. 

As  before,  I am  deeply  indebted  and  grateful  to  my  colleagues 
for  help  in  the  preparation  of  this  volume:  especially  to  Prof. 
Lowell  J.  Reed  and  to  Dr.  John  Rice  Miner.  Their  suggestions, 
advice,  and  criticism  have  been  invaluable,  and  they  have  also  given 
much  aid  in  matters  of  computation,  etc.  To  the  artist  of  the 
staff  of  the  Department  of  Biology,  Mr.  Arthur  Johannsen,  I am 
indebted  for  the  new  illustrations,  and  to  Miss  Hermine  Grimm  for 
help  in  the  details  of  manuscript  preparation  and  proof-reading. 
Two  former  students,  Dr.  R.  B.  Tewksbury  and  Dr.  T.  J.  LeBlanc, 
have  been  helpful  with  critical  suggestions  for  the  revision.  I am 
very  grateful  to  Prof.  Haven  Emerson  of  Columbia  University  and 
Dr.  T.  F.  Murphy,  Chief  Statistician  for  Vital  Statistics  of  the 
Census  Bureau,  for  help  in  getting  the  new  material  incorporated 
in  Chapter  III.  I am  obliged  to  the  Macmillan  Company  for  per- 
mission to  reprint  in  modified  form  the  material  in  Chapter  I under 
the  heading  “The  Nature  of  Statistical  Knowledge”  from  an  earlier 
book,  “Modes  of  Research  in  Genetics/’  of  which  that  company 
owns  the  copyright.  Finally  to  my  old  and  dear  friends  G.  Udny 
Yule,  F.  R.  S.,  and  Major  Greenwood,  F.  R.  S.,  I am  deeply  grate- 
ful for  their  permission  to  reproduce  their  portraits  in  this  volume. 

Raymond  Pearl. 


October , 1930. 
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PREFACE  TO  FIRST  EDITION 


This  book  is  the  result  of  many  years’  experience  in  attempting 
to  teach  biometric  methods  to  biologists  and  medical  men.  Its 
faults  and  its  merits,  if  any,  both  derive  mainly  from  that  experi- 
ence. Perhaps  nearly,  if  not  quite,  every  traditional  canon  of  sup- 
posedly sound  pedagogy  in  the  teaching  of  mathematics  is  done 
more  or  less  violence  to  in  the  pages  that  follow.  For  this,  as  an 
admirer  in  some  degree  of  tradition  in  general,  I am  sorry.  My 
only  plea  in  extenuation  is  a merely  pragmatic  one.  The  mode  of 
exposition  of  the  subject  followed  in  this  book  works.  I know  be- 
cause I have  tried  it,  many  times  and  on  many  people.  Our  students 
seem  to  like  the  subject,  and  to  feel  that  they  get  something  of  value 
out  of  our  presentation  of  it.  Perhaps  a teacher  ought  not  to  ask 
any  more  than  this.  Certainly  I am  not  disposed  to  of  men  and 
women  whose  primary  interest  is,  and  will  continue  to  be,  in  biology 
and  medicine,  and  most  certainly  not  in  mathematics. 

And  there  is  this  further  to  be  said  on  the  point:  whether  the 
mathematician  likes  it  or  not,  there  are  now  and  there  will  continue 
to  be,  many  biologists  and  medical  men  who  are  going  to  use 
biometric  methods  in  their  work  whether  they  have  had  any  special 
mathematical  training  or  not.  If  we,  who  are  charged  with  the 
elementary  teaching  of  these  persons,  insist  on  a rigorous  mathe- 
matical approach  to  the  subject  at  every  point,  with  complete 
analytical  proofs  of  every  step,  the  net  result  with  the  vast  major- 
ity of  students  will  simply  be  to  disgust  them,  and  drive  them 
away  from  such  sound  elementary  training  as  they  might  other- 
wise be  willing  to  accept,  and  from  which  they,  my  colleagues, 
and  I,  at  least,  agree  that  they  do  profit.  In  writing  this  book, 
therefore,  I have  tried  to  present  the  mathematical  matters  neces- 
sarily involved  in  a language  and  with  a logical  method  of  ap- 
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proach  which  is  not  only  capable  of  being  understood  by  the 
primarily  biologic  or  medical  reader,  but  to  which  persons  of  this 
type  of  mind  and  training  are  sympathetic. 

This  book,  as  its  title  indicates,  is  and  is  intended  to  be,  only 
an  introduction  to  the  subject.  Many  matters  are  omitted  which 
might  properly  find  a place  in  it.  It  is  my  belief,  however,  that 
in  the  present  state  of  development  of  biometry  itself,  and  in  the 
use  which  is  actually  being  made  of  its  principles  in  biology  and 
medicine  by  those  who  are  not,  and  never  will  be,  primarily 
specialists  in  this  field,  there  is  more  need  for  a simple  exposition 
of  the  basic  elements  of  the  subject  than  for  an  exhaustive  treat- 
ise. The  latter  will,  of  course,  come  in  time,  but  for  the  present 
it  seems  to  me  better  to  ground  the  student  in  elementary  prin- 
ciples, and  give  him  an  introduction  to  the  original  sources,  which 
he  may  follow  up  then  for  himself,  to  any  degree  he  likes.  In 
this  connection  there  may  be  some  inclined  to  criticize  because  of 
the  brevity,  and  sometimes  derivative  character,  of  the  reading 
lists  at  the  ends  of  the  chapters.  The  proper  policy  to  pursue  in 
this  matter  has  greatly  puzzled  me.  I have  in  manuscript  a toler- 
ably extensive  and  penetrating  bibliography  of  vital  statistics  and 
biometry.  I might  easily  have  printed  the  whole  of  it  herein. 
But  again,  the  policy  I have  actually  chosen  to  follow,  after  much 
deliberation,  is  based  upon  my  teaching  experience,  which  is  to 
the  effect  that  one  can  cajole  a busy  student  into  only  a definitely 
limited  amount  of  collateral  reading.  It  is  my  conviction  that  it 
is,  in  a practical  sense,  better  to  recognize  this  fact  frankly,  and 
choose  carefully  a limited  list  of  references,  than  to  incorporate 
into  a book  which  is  not  in  any  sense  an  original  source  an  ex- 
tensive bibliography.  I am,  in  this  particular  case,  the  more 
happily  led  to  this  conclusion  because  of  the  splendidly  thorough 
bibliography  of  the  important  original  sources  which  already  exists 
in  Yule’s  “Introduction  to  the  Theory  of  Statistics,”  which  is,  of 
course,  the  classic,  model  text-book  of  modern  statistical  methods, 
and  is  available  to  everyone. 

This  book  is  written  for  the  medical  reader  primarily.  The 
illustrations  of  method  are  mainly  chosen  from  that  field.  Bio- 
metric methods  already  have  a secure  place  in  general  biology. 
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Their  use  is  developing  in  the  medical  field  with  extraordinary 
rapidity  just  now.  It  has  seemed  to  me  on  this  account  that  an 
elementary  introduction  to  the  subject  designed  primarily  and 
directly  for  medical  readers  might  be  found  particularly  useful  at 
this  time. 

I am  indebted  to  various  persons  in  many  ways  for  help  in  the 
making  of  this  book,  though  for  its  defects  I am  alone  responsible. 
First  of  all,  to  my  colleagues  in  this  laboratory,  who  have  loyally 
helped  in  the  organization  and  development  of  our  teaching  work 
to  its  present  stage,  I owe  a debt  which  I cannot  adequately  de- 
scribe. We  have  worked  out  together  our  present  method  of  teaching 
the  subject.  More  specifically,  I am  deeply  grateful  to  Professor 
Lowell  J.  Reed  for  reading  critically  the  manuscript  and  catching 
up  a number  of  errors  which  otherwise  might  have  slipped  by,  and 
for  discussing  with  me  the  most  appropriate  methods  of  presentation 
of  many  points,  both  in  this  book  and  in  our  courses  of  instruction. 
To  Dr.  John  Rice  Miner,  Miss  Agnes  Latimer  Bacon,  and  Dr. 
Flora  D.  Sutton  I am  indebted  for  the  arithmetic  work  on  many 
of  the  numerical  illustrations  of  method.  The  wisdom  and  sagacity 
of  Dr.  William  Travis  Howard,  Jr.  in  the  broad  fields  of  pathology, 
public  health  administration,  and  vital  statistics  have  been  freely 
at  my  disposal,  and  of  inestimable  aid  in  the  whole  development 
of  the  Department  of  Biometry  and  Vital  Statistics  of  the  School 
of  Hygiene  and  Public  Health,  of  which  development  this  book  is 
an  integral  part. 

Finally,  I wish  most  sincerely  and  gratefully  to  acknowledge 
something  of  what  I owe  to  the  great  master  and  creator  of  biom- 
etry, Professor  Karl  Pearson.  When,  nearly  twenty  years  ago  now, 
I spent  a winter  in  his  Biometric  Laboratory  at  University  College, 
London,  I got  a fund  of  inspiration  from  first-hand  contact  with 
the  working  of  his  mind,  which  the  passing  years  have  never 
lessened  or  dimmed,  and  which  I have  tried  to  pass  on  to  my 
students.  If  we  have  sometimes  differed  on  biologic  matters  in 
these  years,  it  has  meant  no  slightest  diminution  of  my  deep  and 
sincere  admiration  for  one  whose  sheer  intellectual  power  has 
rarely  been  equaled  in  the  whole  history  of  science.  Feeling  this 
way  it  is  a great  gratification  and  pleasure  to  me  that  Professor 
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Pearson  has  allowed  me  to  present  to  the  readers  of  this  book  the 
splendid  portrait  which  appears  on  page  58. 

In  the  little  verse  on  page  16  the  “file”  which  Robert  Recorde 
was  writing  about  was  “geometric.”  Such  a “fresshe  fine  witte” 
as  that  old  worthy’s,  however,  would  perceive  and  enjoy,  I am 
sure,  the  peculiar  aptness  of  the  application  of  his  lines  to  biometry 
today. 


Raymond  Pearl. 
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An  Introduction  to 


Medical  Biometry  and  Statistics 


CHAPTER  I 

PRELIMINARY  DEFINITIONS  AND  ORIENTATION 

To  an  ever-increasing  degree  modern  science  is  becoming 
quantitative  in  its  methods  of  thought  and  activity.  The  history 
of  science  from  the  beginning  shows  that  the  earliest  development 
of  any  discipline  is  purely  qualitative,  and  that  only  as  it  emerges 
from  this  state  and  passes  over  into  the  quantitative  phase,  in 
greater  or  less  degree,  does  it  begin  to  take  an  assured  place  in  the 
hierarchy  of  the  established  sciences.  Recent  examples  of  this 
change  from  a qualitative  to  a quantitative  point  of  view  are  found 
in  psychology  and  sociology.  With  the  development  of  knowledge 
and  of  an  appropriate  technic  eventually  any  natural  phenomenon 
which  can  be  observed  can  also  be  quantitatively  measured.  The 
entire  history  of  medicine  shows  that  there  has  been  almost  from  the 
first  an  earnest  desire  and  effort,  on  the  part  of  some  of  its  leaders, 
to  develop  quantitative  modes  of  thought  and  methods  of  work. 
The  large  measure  of  progress  which  has  been  made  in  this  direction 
is  sufficiently  evidenced  by  the  number  of  items  of  diagnostic  and 
clinical  significance  which  are  measured  and  recorded  in  quantita- 
tive terms. 

In  the  ever-increasing  specialization  which  occurs  in  science, 
and  the  multiplication  of  technical  journals  which  such  differentia- 
tion of  interest  necessarily  entails,  it  is  difficult,  not  to  say  impossi- 
ble, for  one  to  keep  abreast  of  all  the  newer  developments  even  in 
his  own  science,  to  say  nothing  of  cognate  subjects.  This  is 
particularly  true  for  the  practitioner  and  investigator  in  the  held 
of  medicine.  The  consequences  are  unfortunate.  One  often  fails 

to  get  the  benefit  of  applying,  in  his  own  subject,  what  might  be 
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very  useful  methods  or  ideas  from  another  science.  This  lack  of 
familiarity  with  even  the  simplest  technical  terminology  of  one  of 
the  newer  special  fields  may  be  so  complete  as  to  be  embarrassing 
in  a general  scientific  gathering  or  discussion  of  any  sort.  It  is 
only  fair  that  any  one  proposing  to  set  out  the  bearings  of  one  of 
the  newer  and  somewhat  highly  specialized  branches  of  science 
upon  an  older  and  established  held  and  to  discuss  its  methods, 
should  begin  by  clearly  defining  at  least  the  more  general  technical 
terms  he  intends  to  use. 


DEFINITIONS 

Biometry  is  a term  which  came  into  general  use  in  the  late 
nineties,  to  designate  that  branch  of  science  which  studies  by 
methods  of  exact  measurement  on  the  one  hand,  and  precise  and 
refined  mathematical  analysis  on  the  other  hand,  the  quantitative 
aspects  of  vital  phenomena.  It  is  a term  co-ordinate  with  biology 
in  its  comprehensiveness.  Indeed,  it  may  perhaps  happen  that 
with  the  passage  of  time  the  term  “biology”  will  be  used  to  cover 
only  qualitative  phases  of  vital  phenomena,  while  biometry  will 
be  the  identifying  term  for  all  discussions  of  measurements  or  counts 
of  living  things  in  the  widest  sense  of  the  words.  The  general 
tendency  of  all  science  is  to  proceed  always  toward  greater  and 
greater  precision  of  results  and  reasoning.  It  has  elsewhere  been 
pointed  out  that  “the  real  purpose  of  biometry  is  the  general  quan- 
tification of  biology.  Its  fundamental  point  of  view  is  that,  without 
a study  of  the  quantitative  relations  of  biologic  phenomena  in  the 
widest  sense,  it  will  never  be  possible  to  arrive  at  a full  and  adequate 
knowledge  of  those  phenomena.  This  point  of  view  insists  that  a 
description  which  says  nothing  about  the  magnitude  of  the  thing 
described  is  not  complete,  but,  on  the  contrary,  lacks  an  element  of 
primary  importance.  It  insists,  also,  that  an  experiment  which 
takes  no  account  of  the  probable  error  of  the  results  reached  is 
inadequate  and  as  likely  as  not  to  lead  to  incorrect  conclusions.” 

Biometry,  as  a definitely  recognized  branch  of  biologic  science, 
owes  its  origin  and  establishment  primarily  to  the  efforts  of  two 
men — the  late  Sir  Francis  Galton,  and  Karl  Pearson,  Galton 
Professor  of  Eugenics  in  University  College,  London.  In  a later 
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chapter  the  part  played  by  each  of  these  men  will  be  set  forth  with 
greater  particularity. 

The  definitions  of  statistics  given  by  Yule,  in  his  well-known 
Introduction  to  the  Theory  of  Statistics,  which  is  by  all  odds  the  best 
general  elementary  introduction  to  the  subject,  are  extremely 
clarifying  and  helpful.  He  says:  “By  statistics  we  mean  quan- 
titative data  affected  to  a marked  extent  by  a multiplicity  of  causes. 

“By  statistical  methods  we  mean  methods  specially  adapted  to 
the  elucidation  of  quantitative  data  affected  by  a multiplicity  of 
causes. 

“By  theory  of  statistics  we  mean  the  exposition  of  statistical 
methods. 

“The  insertion  in  the  first  definition  of  some  such  words  as 
To  a marked  extent’  is  necessary,  since  the  term  ‘statistics’  is  not 
usually  applied  to  data,  like  those  of  the  physicist,  which  are 
affected  only  by  a relatively  small  residuum  of  disturbing  causes. 
At  the  same  time  ‘statistical  methods’  are  applicable  to  all  such 
cases,  whether  the  influence  of  many  causes  be  large  or  not.” 

There  is  another  way  in  which  we  may  define  statistics,  which 
has  important  bearing  upon  the  logical  development  of  the  subject. 
It  may  be  said  that: 

Statistics  is  that  branch  of  science  which  deals  with  the  frequency 
of  occurrence  of  different  kinds  of  things,  or  with  the  frequency  of 
occurrence  of  different  attributes  of  things. 

If  we  discuss  the  case  incidence  of  typhoid  fever  we  are  dealing 
with  the  frequency  of  occurrence  of  things,  for  what  we  say  is  that 
of  N people  constituting  a population  or  group,  a certain  number, 
A,  have  typhoid  fever  within  a given  interval  of  time,  while  during 
the  same  interval  another  number,  B = N — A,  do  not  have 
typhoid  fever.  Here,  then,  are  two  kinds  of  things,  namely,  people 
who  have  typhoid  fever  and  people  who  do  not.  And  so  similarly 
for  all  other  cases  where  the  figures  with  which  we  are  presented 
are  simple  counts  of  the  number  or  frequency  of  occurrence  of 
physically  discrete  entities. 

Let  us  now  look  at  the  other  side  of  the  case.  Stature  is  one  at- 
tribute of  a man,  in  the  sense  that  the  word  “attribute”  is  here  used. 
Suppose  we  measure  carefully  the  stature  of  each  of  1000  men. 
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We  can  then  sort  these  measures  (the  attributes)  into  a series  of 
groups  such  that  each  group  shall  contain  only  statures  which  are 
nearly  alike,  say  differing  by  not  more  than  0.5  cm.  Then,  if 
we  count  the  number  of  cases  in  each  group,  we  shall  have  the 
frequency  of  occurrence  of  each  particular  kind  of  attribute  (i.  e., 
particular  stature)  within  the  original  group  of  1000.  From  these 
frequencies  we  may  then  calculate,  by  simple  processes  to  be  fully 
explained  farther  on,  certain  derivative  constants  like  the  average 
stature,  etc.  But  these  derived  functions  are  all  implicit  in  the 
frequencies,  and  have  no  validity  beyond  that  which  inheres  in  the 
original  counts. 

All  statistics  are  comprised  within  one  or  the  other  of  these  two 
categories,  frequencies  of  things  themselves,  or  of  the  attributes  of 
things. 

The  separateness  of  things  which  makes  them  countable  for 
statistical  purposes  may  be  relative  either  to  space,  or  to  time,  or 
to  both  space  and  time.  If,  upon  the  same  day,  as  in  a census,  we 
count  the  number  of  cases  of  typhoid  fever  existing  in  a city,  we 
shall  have  gathered  statistics  of  the  frequency  of  persons  with  ty- 
phoid fever,  upon  a space  base.  The  underlying  differentiant  factor 
which  makes  these  cases  countable  is  that  each  is,  at  the  same  in- 
stant of  time,  located  at  a particular  and  unique  region  in  space. 
Suppose,  on  the  other  hand,  we  consider  as  a universe  of  discourse 
1000  particular  persons  and  observe  these  same  persons  every  day 
for  a year  to  see  whether  typhoid  occurs  among  them,  it  being 
premised  that  they  do  not  move  about  at  all.  We  shall  then  have  at 
the  end  of  the  year  the  frequency  of  occurrence,  within  the  group,  of 
persons  with  typhoid  fever,  upon  a time  base.  Another  example  may 
perhaps  help  to  clarify  the  point.  We  may  study,  as  the  writer  once 
did,  the  variation  of  milk  production  by  dairy  cows  in  two  ways.  If 
we  examine  the  differences  in  amount  or  quality  of  milk  produced 
by  each  individual  cow  in  a large  herd  on  the  same  day,  we  shall  be 
studying  the  variation  in  milk  production  on  a space  base , since 
each  cow  is  a spatially  separate  entity.  But  suppose,  with  this 
same  herd,  we  pour  each  cow’s  milk  each  day  into  one  big  vat,  mix 
it  thoroughly  with  the  milk  of  all  the  other  cows  in  the  herd,  and 
then  weigh  or  measure  the  whole  amount  of  milk  in  the  vat  each 
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day,  and  by  drawing  a sample  from  it  determine  the  butter-fat 
percentage,  etc.  The  amount  and  quality  of  this  herd’s  milk, 
the  herd  now  being  one  single  spatial  entity,  will  vary  from  day  to 
day  throughout  the  year.  If  now  we  examine  this  daily  variation, 
we  shall  be  studying  the  variation  of  milk  production  upon  a time 
base. 

The  statistical  method  is  essentially  a technic , which  finds  its 
justification  in  its  usefulness  in  helping  to  solve  the  problems  of  the 
basic  sciences,  physics,  chemistry,  biology,  etc.  Statistics,  in  any 
proper  sense,  has  no,  or  at  best  few,  problems  of  its  own.  Its 
technical  problems  are  really  problems  of  mathematics.  The 
statistical  method  is,  or  should  be,  a working  tool  of  science,  just 
as  is  the  microscope  or  the  kymograph.  But  it  is  probably  of  wider 
utility  than  any  other  single  technical  method  which  science  has 
discovered  or  devised.  For  it  has  an  applicability  and  a usefulness, 
direct  or  indirect,  in  virtually  every  problem.  It  is,  in  short,  a 
fundamental  element  of  scientific  methodology. 

Biometry  deals  with  statistics  derived  from  living  things,  or 
things  which  have  at  some  time  been  living,  and  applies  statistical 
methods,  in  the  broadest  sense,  to  such  data. 

“Vital  statistics,”  for  which  a better  term  is  bio  statistics,  is  the 
special  branch  of  biometry  which  concerns  itself  with  the  data 
and  laws  of  human  mortality,  morbidity,  natality,  and  demography. 

In  this  book  the  attempt  will  be  made  to  show,  by  concrete 
examples,  how  the  point  of  view  of  biometry,  and  the  application 
of  modern  statistical  methods,  may  be  of  use  to  the  medical  man 
in  helping  him  to  draw  correct  conclusions  from  his  facts,  and  to 
solve  problems  constantly  arising  in  his  work,  which  he  cannot 
possibly  hope  to  solve  correctly  without  such  methods.  It  is  not 
to  be  expected,  or  perhaps  even  desired,  that  every  medical  practi- 
tioner or  investigator  shall  be  an  accomplished  mathematician. 
But  it  is  evident  enough  to  every  thoughtful  observer  that  clinical 
medicine  is  proceeding  by  great  strides  along  the  quantitative, 
scientific  pathway.  Every  step  in  this  direction  adds  to  the 
necessity  of  the  medical  man  having  at  his  command  the  necessary 
elementary  principles  for  dealing  easily,  confidently,  and  accurately 
with  quantitative  data. 


22 


MEDICAL  BIOMETRY  AND  STATISTICS 


IMPORTANCE  OF  BIOMETRIC  IDEAS  AND  METHODS  IN  MEDICINE 

The  growing  recognition  by  medical  men  themselves  of  the 
importance  of  modern  biometric  methods  and  viewpoint  for  work 
in  medicine  was  forcibly  expressed  a few  years  ago  by  the  dis- 
tinguished clinician,  Dr.  Lawrason  Brown,  in  the  following  words*: 

“None  of  you  will  contradict  me  when  I say  that  statistics  are  very  dry, 
but  some  of  you  may  dispute  me  when  I say  that  only  by  statistics  does  the 
world,  lay  or  medical,  advance.  Consider  what  knowledge  is  and  you  will  see 
how  inseparable  it  is  from  statistics.  Medicine  is  no  exact  science,  and  diagnosis 
rests  largely  upon  the  law  of  probability  which,  in  turn,  is  statistical.  All  sci- 
entific experiments  are  statistical  arguments  in  favor  of  or  in  opposition  to  certain 
inductions  or  deductions.  Further,  statistics  lend  the  authority  that  is  necessary 
for  their  acceptance. 

“The  trouble  in  medicine  does  not  lie  with  the  statistical  method,  but  with 
the  medical  men  who  do  not  know  how  to  use  it.  I regret  to  state  that  I belong 
to  this  class  and  have  felt  keenly  that  in  medical  school  I did  not  have  an  oppor- 
tunity to  attend  a course  on  medical  statistics.  The  day  will  come,  gentlemen, 
when  such  courses  will  be  given,  when  the  law  of  probability  will  help  in  diag- 
nosis, when  the  coefficient  of  correlation,  now  explained  by  most  authorities  in 
such  terms  that  in  a few  minutes  my  idea  of  my  relation  to  my  surroundings  has 
become  totally  insufficient — when,  I say,  all  these  things  will  be  understood  by 
the  medical  graduate.  At  that  time  medical  men  will  cease  to  do  such  foolish 
things  with  statistics  as  to  try  to  add  cabbages  and  cows,  or,  what  is  nearly  as 
bad,  to  try  to  solve  problems  in  heredity  by  finding  how  many  parents  had  the 
disease  from  which  the  offspring  suffers  without  due  respect  to  many  other  very 
important  and  possibly  contradictory  details.  What  would  you  think  of  a book- 
keeper who  after  years  of  personal  experience  would  gather  up  the  bills  in  the 
cash  drawer  and  go  to  the  bank  with  the  statement  that  his  personal  experience 
led  him  to  believe  that  the  roll  of  bills  amounts  to  $1000.  The  receiving  teller 
would  quickly  apply  the  statistical  method  and  few  would  venture  to  side  with 
the  bookkeeper,  no  matter  how  large  his  experience  had  been. 

“Do  not  misunderstand  me.  This  is  not  an  argument  in  favor  of  dry  sta- 
tistical articles  which  we  all  prefer  to  avoid  reading.  But  if  I can  make  you  see 
how  important  it  is  for  us  to  cease  using  the  pet  phrase  ‘my  personal  experience’ 
except  when  we  have  sufficient  data  to  support  it,  I shall  have  accomplished  what 
I had  hoped  for.” 

The  point  of  view  from  which  medical  problems  should  be 
attacked  by  quantitative,  biometric  methods  has  been  well  set 
forth  by  Greenwood11  in  the  course  of  a discussion  of  some  animad- 
versions of  Sir  Almroth  Wright  upon  quantitative  methods,  when 
he  describes  the  method  by  which  a therapeutic  problem  ought  to 
be  investigated.  Greenwood  remarks: 

* Brown,  Lawrason:  American  Review  of  Tuberculosis,  September,  1920,  vol.  iv. 
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'‘Let  us  suppose  that  the  question  is  whether  a certain  treatment  is  of  advan- 
tage in  acute  lobar  pneumonia.  We  must  first  inquire  whether  the  morbid  state 
connoted  by  the  phrase  ‘acute  lobar  pneumonia’  is  clinically  recognizable.  The 
question  is  answered  in  the  words  of  Sir  William  Osier:  ‘No  disease  is  more  readily 
recognized  in  a large  majority  of  cases.  The  external  characters,  the  sputum,  and 
the  physical  signs  combine  to  make  one  of  the  clearest  of  clinical  pictures.  The 
ordinary  lobar  pneumonia  of  adults  is  rarely  overlooked.’ 

“The  next  point  to  be  investigated  is  the  variation  of  fatality  in  cases  not 
treated  by  the  method  under  investigation. 

“(a)  Influence  of  Age. — That  the  fatality  increases  with  the  age  of  the 
patient  is  well  known  and  evidence  need  not  be  quoted  here.  Naturally,  in 
comparing  fatalities  it  will  be  necessary  to  correct  for  age. 

“(b)  Sex. — The  influence  of  sex  is  not  so  marked,  but  allowance  can  similarly 
be  made  for  it. 

“(c)  Secular  Variations.- — It  would  appear  that  these  are  of  minor  impor- 
tance. It  also  appears  that  the  fatality  of  hospital  cases  from  different  institu- 
tions in  the  same  country  during  the  same  period  varies  but  little. 

“(d)  The  Influence  of  Social  Class.- — Evidence  capable  of  being  analyzed  has 
been  sparingly  published.  The  873  cases  recorded  by  the  British  Medical  Asso- 
ciation’s Collective  Investigation  Committee  in  1886  show  a corrected  fatality 
rate  of  17  per  cent.,  which  is  below  the  London  Hospital  rate  for  the  same  period. 
The  results  of  Huss  at  Stockholm,  more  than  forty  years  ago,  suggest  that  the 
fatality  in  the  Military  Hospital  was  about  seven-elevenths  of  the  rate  obtaining 
in  the  General  Hospital. 

“(e)  Influence  of  Race  or  Climate.- — We  find  striking  differences  in  the  hos- 
pital fatality  rates  of  different  countries,  the  rate  at  the  Stockholm  Hospital  in 
the  ‘fifties’  of  last  century  being  far  below  that  recorded  for  the  same  period  at 
Vienna  or  Basel.  There  is  a less  striking  difference  between  the  recent  London 
figures  and  those  of  Chatard  from  Baltimore. 

“In  view  of  what  has  been  said,  it  will  be  plain  that  in  comparing  a series  of 
treated  cases  with  ‘general  experience’  attention  will  have  to  be  paid  to  the 
differences  noted,  all  of  which  can  be  tested  by  the  statistical  method.  When  a 
true  control  series  is  available,  it  will  still  be  necessary  to  allow  for  race  and 
environment.  An  inquiry  into  these  points  would  seem  a necessary  prelude  to 
an  evaluation  of  the  effects  of  any  specific  treatment. 

“These  are  all  questions  of  great  moment,  and  cannot  be  answered  by  appeal 
either  to  authority  or  to  the  introspective  notions  yielded  by  the  ‘experiential 
method.’ 

“Having  made  due  allowance  for  these  difficulties,  we  shall  proceed  to  com- 
pare the  rate  of  mortality  in  the  treated  and  untreated  cases.  This  will  involve 
a careful  sifting  of  the  material,  since  we  must  reject  such  cases  as  died  in  conse- 
quence of  some  accident  in  no  way  connected  with  the  evolution  of  the  disease. 
The  criteria  of  exclusion  must  be  defined,  and  no  case  excluded  without  the 
grounds  of  such  exclusion  being  clearly  stated  and  the  particulars  published  in 
full  to  give  others  an  opportunity  of  judging  the  sufficiency  of  the  criterion. 

“Next,  we  shall  in  some  cases  be  able  to  compare  the  percentages  and  deter- 
mine the  probability  that  such  difference  as  results  might  be  an  ‘error  of  random 
sampling.’  This  will  by  no  means  complete  the  task,  however,  since  it  might 
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happen  that  the  treatment,  although  not  associated  with  a significant  reduction 
of  fatality,  did  influence  the  course  of  the  disease.  The  features  which  it  is 
desired  to  measure  having  been  determined  on,  we  can  by  the  method  of  multiple 
correlation  endeavor  to  connect  the  variations  of  such  features  with  each  other 
and  with  those  of  the  therapeutic  factor  we  are  studying.  Since  in  general  it 
will  be  difficult  to  secure  controls  and  treated  samples  absolutely  alike  in  other 
respects,  the  method  of  correlation  is  likely  to  be  required  in  most  cases.  We 
shall,  indeed,  be  fortunate  if  we  are  able  to  ‘express  the  final  result  in  the  form  of 
a percentage.’ 

“I  have  outlined  the  process  by  which,  as  I think,  such  a problem  may  be 
investigated.  The  essence  of  the  whole  matter  is  to  ask  ourselves  at  every  turn, 
Is  the  control  a real  control?  What  is  the  probability  that  such  and  such  an 
event  is  due  to  such  and  such  a cause?  There  is  no  intrinsic  merit  in  numbers 
and  percentages  or  in  coefficients  of  correlation,  their  value  is  in  aiding  us  to 
think  clearly  and  compelling  us  to  express  conclusions  in  a language  which  all 
may  master  if  they  choose.” 

Dr.  Alfred  E.  Cohn,  of  the  Rockefeller  Institute  for  Medical 
Research,  in  a recent  letter  regarding  the  work  of  the  Heart  Com- 
mittee of  the  New  York  Tuberculosis  and  Health  Association,  dis- 
cusses the  significance  of  statistics  in  medicine  from  a still  different 
angle.  He  has  kindly  permitted  quotation  from  this  letter  here. 

“The  value  of  these  investigations,  statistical  in  nature,  has  often  been  made 
the  subject  of  solicitous,  not  to  say  sceptical  enquiry.  The  Research  Committee 
is  nevertheless  convinced  of  the  value  of  its  enterprises.  It  sees  in  them  a con- 
tinuation of  that  effort  at  classification  of  diseases,  related  to  the  heart  in  this 
case,  which  has  always  been  recognized  to  be  a sound  and  valuable  tradition  in 
the  history  of  medicine,  and  indeed  an  indispensable  method  in  the  history  of 
science  in  general.  That  these  studies  should  be  reliable  in  the  sense  in  which 
statistical  studies  are  believed  not  to  be  so,  has  been  a matter  of  great  solicitude 
on  our  part.  The  method  of  work  has  been  described;  it  depends  on  securing 
reliability  by  attending  to  uniformity  in  nomenclature,  by  following  specific 
criteria  for  naming  diseases,  by  recording  phenomena  in  a uniform  fashion  on 
history  and  physical  examination  forms  which  have  been  carefully  constructed, 
by  supervising  and  by  collating  the  material  once  it  is  recorded  through  a staff 
of  statistical  clerks  which  now  has  had  a detailed  experience  of  several  years 
duration.  So  far  as  we  can  judge,  foresight  and  precaution  have  done  their 
share  in  assuring  satisfactory  results. 

“If  there  is  still  doubt  of  the  value  of  results  such  as  these,  the  doubt  must, 
it  seems,  rest  on  popular  conceptions — perhaps  popular  misconceptions — of  the 
nature  of  statistics.  What  physicians  require  and  what  after  all  they  must  have 
in  the  practice  of  medicine  is  information  twofold  in  nature.  They  must  know 
about  the  general  movements  of  diseases,  or  their  natural  history;  but  they  will 
also  desire  a knowledge  of  methods  which  make  applicable  to  individual  cases  the 
general  considerations  to  which  reference  has  been  made.  The  difficulty  is  here. 
Except  in  so  far  as  the  former  aids  an  understanding  of  the  latter,  it  can  scarcely 
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be  pretended  that  the  statistical  or  general  method  can  be  a useful  practical 
instrument.  Why  this  is  So  demands  perhaps  some  discussion. 

“The  search  for  law  in  biology  and  of  course  in  medicine,  rests  on  the  con- 
ception that  the  discovery  of  laws  has  served  the  physical  sciences  in  extraordi- 
narily useful  ways.  There  can  be  no  doubt  of  the  soundness  of  this  belief.  But 
precisely  what  the  analogy  is  between  law  and  the  individual  in  the  physical 
world,  and  law  and  the  individual  in  the  biological  one,  requires  precise  definition 
— more  precise  indeed  than  is  usually  accorded  to  this  matter.  If  there  is  a 
difficulty  it  lies,  we  believe,  in  misunderstanding  this  relation  of  law  to  the  indi- 
vidual in  the  physical  world.  The  view  we  take  may  be  illustrated.  In  the  case 
of  the  gas  laws  for  instance,  beginning  with  Boyle,  a number  of  statements  have 
been  made  which  permit  accurate  predictions  of  the  behavior  of  volumes  of  gas; 
that  these  laws  do  not  describe  the  behavior  of  individuals  within  the  volume  is 
amply  demonstrated  by  reflecting  on  the  fact  that  the  kinetic  theory  assumes 
violently  diverse  and  unpredictable  behavior  on  the  part  of  the  individual 
molecules  in  these  volumes.  The  laws  apply  to  the  mass,  the  volume,  the  aver- 
age; they  make  no  statements  concerning  individual  performance.  And  yet  the 
laws  are  invaluable;  they  and  their  kind  are  the  basis  of  calculation  in  the  prac- 
tical as  well  as  in  the  theoretical  world.  In  biology  and  in  medicine,  just  as  in 
physics,  it  is  not  to  the  individual  that  the  laws,  whatever  they  are  or  may  be 
discovered  to  be,  apply.  Individuals  represent  deviations  from  any  law  both  in 
biology  and  in  physics;  if  a law  is  sound,  the  deviations  from  the  average  must  not 
however  exceed  a certain  maximum,  the  probable  error.  If  it  does  there  is  no 
law  of  value.  Deviation  is  the  fate  of  the  individual;  uniformity  in  the  sense  of 
identity  either  of  being  or  of  behavior  scarcely  exists  in  any  world.  The'  general 
behavior  of  patients  afflicted  by  typhoid  fever,  the  general  behavior  of  mobs  may 
be  known,  but  to  know  these  phenomena  has,  relatively  speaking,  little  meaning 
in  understanding  or  in  predicting  the  conduct  of  any  individual  in  a mob  or  in 
making  an  accurate  prognosis,  based  on  general  experience  in  the  case  of  any 
typhoid  fever  patient. 

“And  yet  no  one  denies  that  general  statements  can  be  made  and  are  useful 
in  physics  and  in  psychology  nor  that  general  statements  on  prognosis  and  on  the 
natural  history  of  diseases  have  value.  General  statements  and  inference  in 
individual  instances  each  have  their  domain  of  eminent  usefulness.  Harm  results 
only  when  the  nature  and  objects  of  the  two  are  confused.  This  is  the  direction 
in  which  the  Research  Committee  believes  it  comprehends  the  function  of  its 
labors.  Whether  in  the  natural  history  of  any  one  of  the  heart  diseases,  or  in  an 
estimation  of  prognostic  values,  or  in  the  measure  of  success  of  a therapeutic 
agent,  or  of  the  degree  of  relevance  of  social  or  economic  advice,  its  aim  is  the 
attempt  to  understand  general  movement.  No  other  indeed  is  possible.  It 
believes,  because  it  is  plain  teaching  of  the  history  of  science,  that  the  ability  to 
attain  orientation  of  this  sort  is  indispensable  in  envisaging  the  probable  course  of 
any  individual  life  or  of  any  individual  act.  Thought  and  action  would  otherwise 
be  chaos.” 

THE  NATURE  OF  STATISTICAL  KNOWLEDGE 

There  is  a very  general  tendency,  including  in  its  operation  not 
only  the  layman  but  also  the  professional  man  of  science,  toward 
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the  notion  that  there  is  a special  virtue,  a sort  of  transcendent 
heuristic  worth,  in  such  knowledge  as  is  reached  by  the  examina- 
tion of  large  numbers  of  cases.  There  seems  to  be  a feeling,  some- 
times apparently  almost  mystic  in  its  origin  and  in  its  strength,  to 
the  effect  that  statistical  knowledge  is  a higher  and  better  kind  of 
knowledge  than  any  other.  Numberless  quotations  might  be  cited 
to  show  the  prevalence  of  this  view.  Every  one  has  seen  passing, 
as  it  were  in  review,  the  line  of  problems,  which,  if  we  may  trust 
the  assertions  of  the  interested  individuals,  “can  only  be  solved” 
by  the  application  of  the  statistical  method. 

Now  this  attitude  toward  statistical  knowledge  and  statistical 
ideas  (which,  of  course,  include  besides  the  compilation  of  large 
numbers  of  individual  instances,  the  concepts  of  averages,  approxi- 
mation, and  probability)  may  be  entirely  right  and  justifiable  and 
certainly  is  so  in  considerable  part.  Indeed,  a cautious  person  is 
bound  to  be  very  chary  about  even  suggesting  any  criticism  of  it 
when  he  considers  the  eminence  of  some  who  have  espoused  it. 
But  the  statistical  method,  as  an  organized  and  formulated  scien- 
tific technic,  came  only  relatively  lately  into  the  held.  A realistic  ex- 
amination of  its  powers,  sympathetic  if  critical,  cannot  do  any  harm. 

It  is  the  object  of  the  following  remarks  to  discuss  statistical 
concepts  and  methods  with  the  purpose  of  trying  to  see  what  these 
methods  are,  in  fact,  capable  of  doing.  In  this  discussion  let  us 
endeavor  to  avoid  dogmatic  assertion,  since,  in  the  first  place, 
assertion  does  not  really  get  us  far  in  the  search  for  truth,  and,  in 
the  second  place,  the  writer  himself  feels  in  regard  to  these  questions 
very  far  from  that  serene  consciousness  of  being  quite  unassailably 
right  which  is  essential  to  proper  dogmatism.  Indeed,  it  is  for  the 
purpose  of  definitely  formulating  some  doubts,  which  have  grown  in 
his  mind  during  many  years,  that  this  discussion  is  written.  Very 
likely  some  will  not  agree  with  its  reasoning  or  its  tentative  conclu- 
sions, but  even  in  such  event,  it  may  help  the  disagreeing  reader  to  the 
more  complete  ordering  of  his  own  ideas  about  statistical  concepts. 

Let  us  first  consider  this  question:  What  caused  the  develop- 
ment of  the  statistical  viewpoint  and  method,  which  in  science  had 
such  an  important  growth  in  the  nineteenth  century?  For  what 
purposes  did  men  turn  to  the  statistical  method?  This  question 
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has  been  very  ably  discussed  by  Theodore  Merz  in  the  second  vol- 
ume of  his  History  of  European  Thought  in  the  Nineteenth  Century , 
and  we  cannot  do  better  than  follow  his  development  of  the  matter. 
Speaking  of  the  origin  of  statistics,  Merz  says  (loc.  cit.,  pp.  554,  555) : 

“That  which  everywhere  oppresses  the  practical  man  is  the  greater  number 
of  things  and  events  which  pass  ceaselessly  before  him,  and  the  flow  of  which 
he  cannot  arrest.  What  he  requires  is  the  grasp  of  large  numbers.  The  suc- 
cessful scientific  explorer  has  always  been  the  man  who  could  single  out  some 
special  thing  for  minute  and  detailed  investigation,  who  could  retire  with  one 
definite  object,  with  one  fixed  problem  into  his  study  or  laboratory  and  there 
fathom  and  unravel  its  intricacies,  rising  by  induction  or  divination  to  some  rapid 
generalization  which  allowed  him  to  establish  what  is  termed  a law  of  general 
aspect  from  which  he  could  view  the  whole  or  a large  part  of  nature.  The  sci- 
entific genius  can  ‘stay  the  moment  fleeting’;  he  can  say  to  the  object  of  his 
choice,  ‘Ah,  linger  still,  thou  art  so  fair’;  he  can  fix  and  keep  the  star  in  the  focus 
of  his  telescope,  or  protect  the  delicate  fiber  and  nerve  of  a decaying  organism 
from  succumbing  to  the  rapid  disintegration  of  organic  change.  The  practical 
man  cannot  do  this;  he  is  always  and  everywhere  met  by  the  crowd  of  facts,  by 
the  relentlessly  hurrying  stream  of  events.  What  he  requires  is  grasp  of  num- 
bers, leaving  to  the  professional  man  the  knowledge  of  detail.  Thus  has  arisen 
the  science  of  large  numbers  or  statistics,  and  the  many  methods  of  which  it  is 
possessed.” 

Further  on  the  same  author  says  of  the  origin  of  the  science  of 
probability  (loc.  cit.,  pp.  567,  568): 

“The  necessity  of  having  recourse  to  elaborate  countings,  to  registrations  of 
births,  deaths,  and  marriages,  to  lists  of  exports  and  imports,  to  records  of  con- 
sumption and  production  of  foodstuffs  and  many  other  items,  forced  upon  those 
who  were  intrusted  with  the  gathering  and  using  of  these  data  the  observation 
that  all  such  knowledge  is  incomplete  and  inaccurate.  Owing  to  the  variability, 
within  certain  limits,  of  recurring  events  and  the  errors  of  counting  and  regis- 
tration, we  have  to  content  ourselves  always  with  approximation  instead  of 
certainty.  Error  bulks  very  largely  in  all  statistics,  and  vitiates  them;  and  as 
regards  coming  events,  our  minds  are  in  a state  of  expectation  rather  than  of 
assurance.  But  events  can  be  more  or  less  probable,  errors  can  be  greater  or 
smaller,  cumulative  or  compensatory,  and  our  expectations  may  be  well-  or 
ill-founded.  And  so  there  has  arisen  the  science  of  Probabilities  and  of  Chances, 
and  the  Theory  of  Error,  two  subjects  intimately  interwoven.  The  former  arose 
in  the  seventeenth  century  out  of  the  frivolous  or  vicious  practice  of  betting 
and  gambling,  whilst  the  latter  was  founded  when  astronomical  observations 
accumulated,  and  the  question  presented  itself  how  to  combine  them  so  as  to 
arrive  at  the  most  reliable  result.” 


Now  from  these  two  quotations,  which  may  certainly  be  con- 
sidered as  fairly  stating  the  case,  it  is  apparent  that  those  circum- 
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stances  which  led  men  to  turn  to  statistical  methods  of  reasoning 
and  investigation  were  not  such  as  grow  out  of  an  increasing 
precision  and  certainty  of  knowledge  about  the  events  or  things 
under  consideration,  but  rather  were  quite  the  opposite.  In  other 
words,  the  statistical  point  of  view,  in  the  first  instance,  was  adopted 
as  an  admittedly  imperfect  means  of  getting  some  sort  of  knowledge 
about  a class  of  events  concerning  which  it  was  difficult  or  impos- 
sible to  get  by  other  methods  the  precise  or  particular  kind  of 
knowledge  which  was  wanted.  To  take  a concrete  example.  A life 
table  tells,  with  considerable  and  commendable  accuracy,  when 
men  aged  fifty-six,  say,  will  die,  on  the  average.  But  what  the 
family,  his  business  associates,  his  physician,  and  a good  number  of 
other  people  would  like  to  know  is  when  John  Particular  Smith, 
aged  fifty-six,  will  die,  and  this  statistics  are  unable  precisely  to 
tell.  No  honest  and  intelligent  person  can  be  deluded  into  the 
belief  that  “in  general”  or  “on  the  average”  knowledge  is  as  satis- 
factory or  useful  as  “individual”  knowledge  would  be  if  he  could  get 
it,  when  it  is  individuals  he  is  concerned  about,  as  is  mostly  the  case. 

A careful  consideration  of  the  history  of  statistical  science,  as 
well  as  of  the  present-day  application  of  these  methods,  leads  to  the 
conclusion  that  statistical  methods  are  used  for  two  sorts  of  pur- 
poses, or  to  gain  two  sorts  of  knowledge  about  events  or  things. 

(A)  On  the  one  hand  the  statistical  method  finds  one  of  its 
chief  uses  in  furnishing  a method  (and  the  only  one  known  in 
science)  of  describing  a group  in  terms  of  the  group’s  attributes, 
rather  than  in  terms  of  the  attributes  of  the  individuals  which 
compose  the  group. 

What  sort  of  positive,  definite,  and  exact  knowledge  do  statistics 
give  us? 

1.  Precise  knowledge  of  the  composition  of  groups  or  masses. 
This  is  the  knowledge  gained  by  counting.  Suppose  we  find  a basket 
containing  a number  of  balls  of  several  different  colors,  and  proceed 
to  count  them  with  the  following  results: 

7 Reds 
9 Whites 
2 Blacks 
1 Green 
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Such  a count  furnishes  us  at  once  with  a great  deal  of  per- 
fectly definite  and  precise  information  about  this  group  or  popula- 
tion of  balls.  For  example,  the  count  tells  us  that  it  will  never  be 
possible  to  take  away  from  the  basket  more  than  one  pair  of  balls 
of  which  one  member  is  green.  This  is  a definite  attribute  of  this 
population  which  may  be  used  to  differentiate  it  from  other  popula- 
tions. In  this  particular  population  only  one  green  ball  occurs. 

This  sort  of  knowledge  derived  by  counting  is  perfectly  definite 
and  precise  so  far  as  relates  to  the  particular  group  which  it  con- 
cerns in  any  particular  case.  It  does  not  involve  any  approxima- 
tion, or  probability,  and  is  as  precise  as  knowledge  of  the  indi- 
vidual. It,  however,  pertains  to  the  group.  It  forms  a part  of  a 
proper  scientific  description  of  a group  to  count  the  numbers  of  each 
of  the  different  kinds  of  elements  which  compose  it. 

2.  Knowledge  of  certain  abstract  qualities  of  groups.  This 
knowledge  is  obtained  by  calculation  from  the  data  got  by  counting. 
The  more  important  of  the  abstract  qualities*  of  groups  are : 

(a)  The  central  or  typical  condition  of  the  group;  or  the  condi- 
tion about  which  the  individuals  composing  the  group  cluster. 
This  is  variously  measured:  by  the  arithmetic  mean  or  average, 
which  gives  the  center  of  gravity  of  the  group;  by  the  median,  which 
tells  the  point  on  either  side  of  which  exactly  half  the  individuals 
fall;  by  the  mode,  which  tells  the  point  of  greatest  frequency  of 
occurrence  in  the  group,  etc. 

(. b ) The  degree  of  individual  diversity  comprised  in  the  group. 
This  attribute,  called  the  variability  of  the  group,  is  again  variously 
measured:  by  standard  deviations,  coefficients  of  variation,  etc. 

(c)  The  degree  of  asymmetry  of  the  distribution  of  the  indi- 
viduals composing  the  group.  This  is  measured  by  the  skewness 
or  other  related  constants. 

(d)  Various  other  attributes  of  distributions  might  be  here 
included,  such  as,  for  example,  the  kurtosis,  but  for  purposes  of  the 
present  general  analysis  this  is  not  necessary.  Though  some  of 
these  attributes  involve  very  complex  mathematical  expressions  for 
their  determination,  the  general  fact  remains  clear  that  they  are 

* A more  detailed  discussion  of  the  following  constants  will  be  found  in 
Chapter  XIII. 
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all  attributes  of  groups  or  masses  which  are  described  by  the 
statistical  constants. 

One  point  should  be  quite  clear.  It  is  that  the  kind  of  knowl- 
edge discussed  under  this  heading  2 is  just  as  definite  and  precise, 
and  involves  as  little  approximation  and  indeterminism,  as  does 
any  piece  of  individualistic  knowledge,  so  long  as  we  confine  atten- 
tion solely  to  the  particular  group  discussed  in  a particular  single 
case.  It  is  the  custom  to  state  means,  for  example,  with  probable 
errors.  But  this  is  only  because  it  is  proposed,  overtly  or  tacitly, 
to  extend  the  conclusions  beyond  or  outside  of  the  particular  group 
and  the  particular  instance  for  which  the  mean  was  calculated. 
For  that  group  and  that  instance  the  mean  is  perfectly  exact  and  pre- 
cise to  that  degree  of  precision  denoted  by  the  unit  of  measure 
used,  assuming  that  no  arithmetical  mistakes  have  been  made  in 
its  computation.  Thus  suppose  one  measures  the  stature  of  three 
men  to  the  nearest  inch,  and  then  calculates  the  average.  The 
result  is,  without  any  probable  error,  the  average  height,  at  the 
particular  moment  when  they  were  measured,  of  those  three  men 
exact  to  the  unit  of  measurement  used.  It  describes  and  measures 
precisely  an  attribute  of  those  men  considered  together  as  a group 
or  trio.  But  if  we  were  to  consider  this  result  from  the  viewpoint 
of  whether  it  gave  a reasonable  measure  of  the  average  height  of 
men  in  general,  or  from  the  viewpoint  of  whether  it  gave  a proper 
value  for  the  mean  height  of  these  men  when  repeatedly  measured 
under  varying  conditions,  it  would  clearly  be  subject  to  a large 
probable  error.  It  would,  in  point  of  fact,  have  lost  its  character 
of  precise  and  definite  knowledge,  and  have  become  a more  or  less 
poor  prediction,  approximation,  or  guess,  for  the  reason  that  three 
men  are  too  meager  a number  to  give  any  reliable  indication  of  the 
attributes  in  general. 

3.  Knowledge  of  the  degree  of  association  or  contingency  be- 
tween different  events  or  characters  within  a group.  This  is  fur- 
nished by  the  method  of  correlation  in  one  or  another  of  its  various 
forms.  By  this  general  method  it  is  possible  to  get  a numerical 
index  of  the  degree  of  likeness,  in  the  direction  and  amount  of  the 
variation  in  two  or  more  characters  in  the  individuals  composing  a 
group.  So  long  as  attention  is  confined  to  the  particular  group  on 
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which  the  measurement  is  made,  and  to  that  group  alone,  and  to  a 
single  instance  (in  time)  the  knowledge  gained  is  precise.  It  is  a 
part  of  the  description  of  the  attributes  of  that  group.  But  when 
we  endeavor  to  predict  from  that  particular  group  to  other  groups 
or  individuals  or  to  conditions  in  general,  our  results  are  no  longer 
precise,  but  inferential  and  what  are  called  probable  errors  tell  us 
something  about  the  degree  to  which  the  inference  may  be  regarded 
as  trustworthy. 

Summarizing  the  results  of  the  above  analysis,  we  see  that  the 
statistical  method  can 

1.  Furnish  precise  descriptive  knowledge  about  groups.  This 
knowledge  is  of  various  sorts.  It  is  definite  and  precise  so  long  as 
attention  is  confined  solely  to  the  particular  group  and  the  par- 
ticular instance  on  which  it  is  based. 

2.  The  knowledge  gained  by  the  statistical  method,  as  we  have 
analyzed  it  above,  precise  though  it  may  be,  pertains  to  the  group 
and  not  to  the  individual.  It  is  exact  knowledge  about  the  composi- 
tion, or  attributes,  or  contingencies  of  masses  or  groups. 

3.  This  ability  to  describe  groups  in  terms  of  the  groups’  own 
attributes,  which  is  an  unique  property  of  the  statistical  method,  is 
extremely  useful  in  the  practical  conduct  of  scientific  investigation. 
It  makes  the  statistical  method  a valuable  adjunct  to  every  other 
scientific  method,  and  particularly  to  the  experimental. 

(B)  We  may  now  turn  to  a wholly  different  aspect  of  the 
statistical  method,  wherein  it  is  used  for  the  purpose  of  predicting 
or  estimating  the  probable  or  the  approximate  condition  in  the 
individual  from  a statistical  examination  of  the  condition  in 
the  mass  or  the  group.  Resort  is  had  to  the  statistical  method 
for  this  purpose  primarily  in  those  cases  where  the  outcome  of  the 
event,  or  the  condition  of  the  thing  in  a particular  individual  case 
cannot  be  directly  determined  by  direct  examination  of  that 
particular  individual,  because  of  spatial  or  temporal  limitations  im- 
posed by  the  nature  of  the  problem;  and  also  where  the  outcome 
of  the  event  or  the  condition  of  the  thing  is  determined  by  the 
combined  action  of  a large  number  of  small  causes,  each  about 
equally  influential  upon  the  final  result. 

Originally  the  statistical  method  was  only  employed  for  this 
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second  purpose  in  cases  where,  because  of  the  multiplicity  of  the 
cause  groups  involved  in  the  determination  of  the  event,  and  the 
consequently  small  effect  of  each,  it  was  impossible  to  make  any 
reasonable  prediction  regarding  an  individual  from  an  examination 
of  that  individual  alone.  Such  employment  might  be  considered 
legitimate,  though  not  very  fruitful,  on  the  ground  that  any  pre- 
diction so  made,  uncertain  and  doubtful  as  it  may  be,  is  after  all 
perhaps  better  than  no  prediction  at  all.  As  time  has  gone  on, 
however,  there  has  been  an  increasing  tendency  to  assume  that 
this  use  of  the  statistical  method  has  general  a priori  validity  and 
can  be  profitably  employed  in  all  sorts  of  cases. 

This  leads  us  to  consider  carefully  the  general  question  of  the 
validity,  on  the  one  hand,  and  the  usefulness,  on  the  other  hand,  of 
this  whole  second  mode  of  employment  of  the  statistical  method. 
It  is  the  one  which  has  attracted  the  greatest  attention  because  of  its 
essentially  spectacular  nature  coupled  with  a sort  of  mysteriousness 
bordering  upon  the  miraculous.  It  seems  a wonderful,  indeed  almost 
a superhuman,  accomplishment  to  be  able  to  say  in  the  manner  of 
the  oracles  of  old,  “So  many  men  will  commit  suicide  next  year.” 

Since  Clerk-Maxwell  introduced  statistical  modes  of  reasoning 
into  physical  science  there  has  been  an  ever-increasing  tendency 
to  regard  the  universe  as  organized  on  a statistical  plan.  This 
has  come,  by  gradual  evolution,  to  carry  with  it  two  implications, 
one  of  which  seems  quite  fallacious  and  the  other  partly  so. 

The  first  of  these  is  that  the  individual  events,  of  which  all  the 
causes  are  not  precisely  known  to  us,  are  indeterminate.  Such  an 
assumption  is,  of  course,  unwarranted.  Because  we  do  not  know 
all  the  causes  leading  to  a particular  event  does  not  mean  that 
that  event  is  any  less  precisely  determined  by  the  course  of  ante- 
cedent events.  Consider  a box  containing  100  consecutively  num- 
bered cards.  Suppose  one  card  were  to  be  drawn  and  that  it  bore 
the  number  36.  It  would  be  quite  impossible  to  formulate  pre- 
cisely all  the  causes  which  led  to  the  drawing  of  the  number  36  on 
the  particular  occasion  considered,  but  it  is  equally  impossible  to 
conceive  that  this  particular  drawing  was  not  definitely  “caused.” 
In  other  words,  there  clearly  was  a whole  train  of  antecedent  cir- 
cumstances, which  taken  all  together  definitely  resulted,  and  could 


PRELIMINARY  DEFINITIONS  AND  ORIENTATION 


33 


only  have  resulted , in  the  drawing  of  the  number  36.  The  too  preva- 
lent conclusion  that  the  application  of  the  statistical  method  or 
statistical  modes  of  thought  implies  phenomenal  indeterminism  in 
the  individual  case  seems  to  be  totally  fallacious.* 

The  second  currently  accepted  implication  of  a statistical  view 
of  the  universe  is  that  in  general  a particular  event  or  phenomenon 
is  the  outcome  of  the  combined  action  of  a great  number  of  causes, 
each  of  which  alone  produced  but  a small  part  of  the  final  total  effect. 
There  is  clearly  so  much  truth  in  this  point  of  view  as  is  included 
in  the  fact  that  individual  events  or  phenomena  do,  in  some  degree 
or  other,  vary,  and  further  these  variations  in  general  distribute 
themselves  more  or  less  in  accord  with  well-known  laws  of  error. 
But  the  assertion  that  events  are  individually  the  outcome  of  the 
action  of  great  numbers  of  causes,  each  of  which  had  a small  part 
arid  a part  significantly  equal  to  that  played  by  every  other  one  of 
the  causes  concerned  in  the  final  result,  appears  upon  examination 
to  be  true  only  if  the  “universe  of  discourse”  is  indefinitely  extended 
in  time.  But  practically  science  works  in  a definitely  and  rather 
narrowly  limited  universe  of  discourse  so  far  as  concerns  time.  It 
undoubtedly  is  true  that  a vast  number  of  small  causes  do  play  a 
part  in  the  determination  of  any  particular  event.  But,  in  many 
of  the  events,  at  least,  in  which  science  is  interested,  these  multi- 
tudinous minor  causes  do  not  play  any  significant  part  in  the  differ- 
ential determination  at  a particular  instant  of  time.  There  is  in 
connection  with  the  causation  of  most  events  some  one  or  two,  or 
at  most  a very  few,  outstanding  cause  groups,  which,  for  all  prac- 
tical purposes,  at  a given  moment  completely  determine  their 
occurrence.  The  total  effect  of  all  the  vast  number  of  other  minor 
causes  concerned  in  the  remote  past  is  so  minute,  as  compared 
with  the  part  played  by  the  really  determinative  ones  at  the 
moment,  as  to  be  negligible.  In  other  words,  all  natural  cause 
groups  are  not  small,  nor  of  equal  (balanced)  value  in  the  final 
determination  of  the  event  to  which  they  relate,  provided  we  confine 
ourselves  to  the  time  limits  of  finite  practical  operations. 

* It  should,  however,  be  said  that  there  are  a few  men  of  science  and  phil- 
osophers who  take  the  view  of  phenomenal  indeterminism.  Their  view  has  not 
won  general  acceptance. 
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The  fact  that  all  natural  causes  or  cause  groups  are  not  equally 
significant  quantitatively  is,  of  course,  what  makes  the  experimental 
method  fruitful- — one  might  perhaps  even  say  possible- — in  science. 
The  very  essence  of  the  experimental  method  is  that  the  conditions 
for  the  happening  of  an  event  are  so  arranged  that  the  influence  of 
one  putative  causal  factor  or  a very  limited  number  of  such  factors 
may  be  tested  at  a time.  If  with  a radical  change  in  this  one  factor, 
whilst  all  others  remain,  so  far  as  may  be,  constant,  no  change  in 
the  happening  of  the  event  is  observed,  the  experiment  has  shown 
that  this  particular  factor  has  no  significant  causal  relation  to  the 
happening  of  the  event.  If  a marked  change  in  the  happening  of 
the  event  is  observed  always  to  follow  the  change  of  conditions  of 
operation  of  the  factor  under  investigation,  then  clearly  this  factor 
plays  a determinative  part.*  In  other  words,  it  is  a fundamental 
logical  prerequisite  of  the  experimental  method  if  it  is  to  be  suc- 
cessful (that  is,  contribute  to  knowledge)  that  it  operate  in  a universe 
in  which  all  causal  factors  are  not  of  equal  quantitative  significance 
at  any  given  instant  of  time. 

Clearly  experimental  analysis  of  this  sort  would  have  quickly 
discovered,  if  the  common  sense  of  men  had  not  long  previously 
shown,  that  the  course  which  a particular  event  is  going  to  take  is 
not  immediately  the  result  of  the  action  of  an  indefinitely  large 
number  of  individually  insignificant  causal  factors,  but  that  it  is 
the  outcome  of  the  action  of  a few  immediately  determinative 
factors,  and  the  effect  of  the  indefinitely  large  number  of  historic- 
ally antecedent  small  causes  is  insignificant  in  the  sense  of  being 
differential.  Generalized,  the  point  may  be  put  in  this  way:  an 
event  A is  about  to  happen,.  It  may  happen  in  any  one  of  n dif- 
ferent ways,  each  one  of  which  ways  may  be  designated  by  a letter, 
/,  p,  r,  t,  etc.  Now  an  indefinitely  large  number  of  causes  are  con- 
cerned in  bringing  it  about  that  the  event  A is  going  to  happen, 
and  that  it  can  equally  well  happen  either  as  /,  p,  r , t,  etc.  In  other 
words,  the  setting  of  the  stage  for  the  event  has  involved  a vast 
number  of  small  and  balanced  causes.  But  the  causes  which  are 
differential  in  the  particular  case,  that  is,  which  determine  that  A 

* Cf.  Jennings’  valuable  paper  on  radical  experimental  analysis,  American 
Naturalist , vol.  47,  pp.  349-360,  1913. 
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shall  happen  in  the  p way  this  particular  time,  and  not  in  the  /, 
the  t,  or  any  other  way,  are,  in  general: 

1.  Few  in  number. 

2.  Immediate  in  time. 

3.  Large  in  relative  quantitative  effect. 

The  point  under  discussion  may  perhaps  be  made  plainer  by  a 
homely  illustration.  Suppose  a man  steps  up  behind  a mule  and 
prods  the  creature  with  his  walking  stick.  The  human  intellect 
is  unequal  to  the  task  of  predicting  exactly,  in  the  particular  case, 
what  precise  portion  of  the  man’s  body  the  mule’s  hoof  will  land 
upon.  A multitude  of  minor  causes  will  affect  this:  the  relative 
height  of  the  man  and  the  mule,  the  age  of  each,  the  place  poked 
with  the  walking  stick,  the  degree  of  fatigue  of  the  mule,  the  tem- 
perature, the  season  of  the  year,  and  countless  other  things  have 
an  influence  in  determining  just  the  precise  spot  where  the  mule’s 
foot  and  the  man’s  body  come  together.  These  could  be  investi- 
gated statistically  and  tables  drawn  up  from  which  one  could  pre- 
dict the  part  of  the  man  which  would  probably  receive  the  hoof. 
But  what  a silly,  futile  piece  of  business  this  all  would  be,  since 
clearly  the  influence  of  all  these  small  causes  on  what  happens  to 
the  man  is  stupendously  overshadowed  by  the  results  of  two  factors, 
namely,  putting  himself  behind  the  mule  and  prodding  the  animal 
with  a stick.  Of  course,  a vast  number  of  antecedent  causes  are 
involved  in  the  setting  of  the  stage,  but  these  are  not  differential 
in  the  determination  of  the  end-event  of  the  series. 

The  preceding  illustration  has  nothing  directly  to  do  with  sci- 
ence, but  the  essential  point  involved  operates,  in  the  use  of  the 
statistical  method  as  a weapon  of  scientific  research.  This  method, 
being  only  a descriptive  method,  tells  us  nothing  directly  about  the 
causes  involved  in  the  determination  of  any  events  or  phenomena 
under  consideration.  It  may  be  of  great  aid,  in  combination  with 
the  experimental  method,  in  helping  to  arrive  at  such  knowledge, 
but  alone  and  of  itself  it  cannot  directly  furnish  knowledge  of  causes 
of  individual  events.  Yet  the  statistical  method,  particularly  in 
that  phase  of  it  which  we  have  here  under  discussion,  which  essays 
to  predict  the  probable  condition  of  the  individual  from  the  knowl- 
edge of  the  mass,  seems  to  furnish  information  about  causes.  It 
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wears  a specious  air  of  bringing  a kind  of  knowledge  which  in 
reality  it  not  only  never  does,  but  from  the  very  nature  of  the  case 
never  can  furnish. 

Let  us  consider  now  a little  more  in  detail  the  nature  of  the 
prediction  of  the  probable  condition  of  the  individual  from  a knowl- 
edge of  the  mass  or  group.  It  has  been  shown  that  statistics  give 
perfectly  definite  and  precise,  and  often  very  useful,  knowledge 
about  masses  or  groups.  We  are  now,  however,  not  concerned  with 
this  as  group  knowledge,  but  rather  with  one  use  to  which  such 
knowledge  has  been  put.  This  use  is  that  which  is  comprised  in 
the  subject  of  statistical  probabilities,  and  which  involves  the 
drawing  of  conclusions  as  to  the  probable  condition  of  the  indi- 
vidual, based  on  an  exact  knowledge  of  a particular  mass  or  group. 

In  order  to  approach  the  subject  in  the  simplest  way  let  us 
consider  a concrete  case.  Suppose  a problem  of  the  following  sort 
were  to  be  set  for  answer:  What  is  the  probability  that,  at  some 
chosen  moment  of  time,  the  next  birth  to  occur  in,  let  us  say,  the 
city  of  Baltimore,  will  be  of  a white  child.  Now  if  we  look  at  this 
as  a question  of  statistical  probability  the  appropriate  way,  of 
course,  to  go  about  solving  it  is  to  turn  up  the  registration  reports 
for  the  city  of  Baltimore  covering  a period  of  years,  and  find  out 
what  is  the  proportion  of  white  to  colored  births  in  that  city.  Then 
by  the  simplest  theorem  in  the  calculus  of  chance,  the  probability 
that  any  single  birth  in  Baltimore,  taken  by  itself,  will  be  of  a white 
child  is  conventionally  regarded  as  given,  in  principle,  by  a fraction 
of  which  the  numerator  is  the  number  of  white  children  born  in 
Baltimore  and  the  denominator  is  the  total  number  of  children 
born  in  Baltimore,  both  figures  including  the  same  period  of  time. 
When  we  have  obtained  such  a fraction  we  have  a definite  piece  of 
statistical  knowledge,  but  of  just  what  use  is  it  so  far  as  concerns 
a particular  individual  case,  the  “next”  birth?  It  implies  no  bio- 
logical knowledge  of  any  kind;  no  knowledge  of  the  laws  of  heredity. 
It  really  adds  essentially,  it  seems  to  me,  to  the  sum  total  of  the 
world’s  knowledge  only  one  thing.  That  thing  is  the  proper  betting 
odds  on  what  the  color  of  the  next  child  born  in  the  city  will  be. 
This  knowledge  would  be  really  useful,  in  a pragmatic  sense,  only 
provided  some  one  wishes  to  gamble  upon  that  event. 
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Of  course  the  statistical  count,  on  which  the  probability  is  based, 
in  itself  furnishes  definite  and  precise  information  about  the  popula- 
tion of  Baltimore,  as  a population.  This  may  be  useful.  What  we 
are  now  considering,  though,  is  knowledge  about  individual  cases. 

Let  us  see  what  a totally  different  kind  of  ability  to  predict  the 
future  event  in  an  individual  case  is  gained  when  we  take  into 
account  one  single  biological  fact  of  an  individualistic  instead  of  a 
statistical  character.  Suppose,  that  is  to  say,  that  we  are  informed 
that  the  mother  of  the  next  baby  to  be  born  in  Baltimore  is  a 
black  woman.  It  needs  no  argument  to  show  how  much  more 
precise  will  be  the  prediction  as  to  the  color  of  the  next  baby  under 
these  conditions. 

This  illustration  brings  out  clearly  the  difference  between  the 
two  possible  bases  for  the  prediction  of  a future  event.  On  the  one 
hand,  such  prediction  may  be  based  merely  on  statistical  ratios. 
This  means  only  a count  of  an  indefinitely  large  past  experience 
regarding  the  occurrence  or  failure  of  the  event,  but  in  no  way  takes 
into  account  the  causes  which  underlie  the  happening  of  the  event 
in  any  particular  case.  On  the  other  hand,  we  have  the  prediction 
which  is  based  on  a definite  knowledge  of  the  determinative  causes 
which  bring  about  the  happening  of  a particular  individual  event 
of  the  sort  in  which  we  are  interested  and  about  which  we  are  to 
predict.  There  can  be,  it  would  seem,  no  comparison  between  the 
usefulness,  in  the  pragmatic  sense,  of  these  two  kinds  of  knowledge. 
The  statistical  knowledge  on  which  a statistical  prediction  is 
made  is  essentially  the  most  sterile  kind  of  knowledge  that  one  can 
possibly  have  so  far  as  concerns  the  individual  event.  It  merely  gives 
one  the  betting  odds  for  or  against  the  occurrence  of  that  event, 
and  absolutely  nothing  more.  Now  a wager,  however  large,  in  the 
scientific  sense  neither  discovers,  expounds,  nor  is  a criterion  of  the 
truth.  Bets,  in  other  words,  are  not  evidence,  though  the  statis- 
tician sometimes  seems  to  forget  this,  and  to  deal  with  statistical 
ratios  as  though  they  had  probative  worth  in  regard  to  individual 
phenomena. 

On  the  other  hand,  a prediction  based  on  experimentally  ac- 
quired knowledge  of  the  determinative  cause  of  the  individual 
event  brings  with  it  a more  realistic  knowledge  of  a natural  phe- 
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nomenon.  The  predictions  so  made  may  not  always  turn  out 
correct,  but  when  they  do  not,  it  incites  us  to  investigate  the 
particular  disturbing  factor  which,  under  such  circumstances,  may 
overwhelm  the  normally  determinative  cause  of  a particular  event. 

Man  soli  das  Kind  nicht  mit  dem  Bade  verschiitten.  The  critical 
reader  may  be  inclined  to  think  that  this  is  exactly  what  the  dis- 
cussion in  this  section  has  done.  If,  as  has  there  been  suggested, 
that  part  of  the  statistical  method  which  uses  the  calculus  of 
probability  as  a basis  for  the  prediction  of  future  events,  gives  only 
a knowledge  of  betting  odds,  one  may  ask:  What  about  the  whole 
concept  of  probable  error?  The  value  of  this  concept  in  scientific 
research  is  unquestioned.  Yet  plainly  the  whole  concept  has  its 
basis  in  the  calculus  of  probability.  Has  not  our  discussion  led  us 
unwittingly  into  a serious  contradiction? 

I think  not.  Let  us  examine  the  probable  error  concept  a little 
more  carefully.  Suppose  we  read  that  the  mean  length  of  the  thorax 
of  a thousand  fiddler  crabs  is  30.14  =*=  0.02  mm.  Just  what  does 
this  actually  mean?  Accepting  the  figures  at  their  face  value,  or, 
put  another  way,  assuming  for  the  argument  that  the  mathematical 
theory  on  which  the  probable  error  was  calculated  was  the  correct 
one,  the  figures  mean  something  like  this:  If  one  were  to  take, 
quite  at  random,  successive  samples  of  1000  each  of  fiddler  crabs 
and  determine  the  mean  thoracic  length  from  each  sample,  these 
means  would  all  be  different  from  each  other  by  varying  amounts. 
In  other  words,  no  single  sample  would  give  us  the  absolutely  true 
value  of  the  mean  thoracic  length  of  all  fiddler  crabs  in  the  world. 
This  true  value  is  in  an  absolute  sense  unknowable,  because,  for 
one  reason,  always  we  must  come  at  the  finding  of  it  by  the  way  of 
random  sampling,  and  sampling  means  variation.  Now  it  is  an 
observed  fact  of  experience  that  the  variations  due  to  random 
sampling  distribute  themselves  according  to  definite  laws  of  mathe- 
matical probability.  Knowing  such  laws,  it  is  clearly  possible  to 
state  the  mathematical  probability  for  (or  against)  any  particular 
deviation  or  variation  occurring  as  the  result  of  random  sampling. 
Exactly  this  is  what  the  probable  error  does.  It  says,  in  the  par- 
ticular case  here  considered,  that  it  is  an  even  chance,  that  a devia- 
tion or  variation  in  the  value  of  the  mean  as  great  as  or  greater  than 
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0.02  mm.  above  or  below  will  occur  as  a result  of  random  sampling. 
Or,  put  in  another  way,  it  is  an  even  bet  that  the  value  of  the  mean 
thoracic  length  of  fiddler  crabs  in  general  will  fall  between  30.14  + 
0.02  = 30.16,  and  30.14  - 0.02  - 30.12. 

Now  all  the  knowledge  that  this  probable  error  furnishes  is 
this:  that  if  a man  were  to  say,  “141  bet  a thousand  dollars  that  the 
true  mean  thoracic  length  of  the  population  from  which  this  sample 
of  fiddler  crabs  was  drawn  is  either  over  30.16  mm.  or  under  30.12 
mm.”  one  would  not  be  justified  in  offering  odds.  He  could  wager 
on  even  terms.  Either  party  involved  in  the  transaction  would  be 
as  likely  to  lose  (or  to  win)  as  the  other. 

Putting  the  case  in  this  way,  it  is  clear  that  this  kind  of  knowledge 
which  comes  from  an  examination  of  probable  errors  is  the  same  as 
that  discussed  above.  It  is  a knowledge  of  betting  odds.  It  has  no 
necessary  relation  per  se  to  any  physical,  chemical,  or  biological  laws. 
It  merely  informs  one  how  he  may  safely  gamble  on  an  event  if  he 
is  so  minded  and  can  find  some  one  else  ready  to  do  the  same  thing. 

Wherein  lies  the  value  of  the  probable  error  concept  for  science, 
then?  Simply  in  that  it  serves  as  a test  or  check  on  every  mode  of 
research  in  science.  So  far  as  I can  see,  the  calculus  of  probability, 
in  and  of  itself  alone,  is  not  and  never  can  be  an  effective  weapon 
of  research  for  the  discovery  of  truth  in  phenomenal  science,  be 
it  physical  or  biological.  Yet  it  operates  as  an  ever-present  test  of 
the  trustworthiness  of  the  results  obtained  by  modes  of  research 
which  are  in  themselves  adapted  to  making  discoveries  about 
phenomena.  The  student  of  probability  says  something  like  this 
to  the  experimentalist:  “Yours  is  the  way  to  find  out  the  sig- 
nificant underlying  causes  of  phenomena.  Let  it  be  practiced  with 
all  zeal,  but  let  it  be  remembered  that  you  operate  in  a finite  uni- 
verse, and  consequently  all  your  results  are  subject  to  such  fluc- 
tuations and  variations  as  experience  has  shown  arise  from  random 
sampling.  I regret  that  I cannot  directly  and  alone  discover  sig- 
nificant causes,  but  at  any  rate  I can  furnish  you  a test  whereby 
you  may  reasonably  judge  whether  your  result  is  significantly 
influenced  by  these  fluctuations  of  random  sampling.” 

To  sum  the  whole  matter  up:  It  seems  that  the  statistical 
method  in  science  has  been  used  to  do  two  things. 
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The  first  of  these  is  a unique  function  of  the  method — to  fur- 
nish a description  of  a group  of  objects  or  events  in  terms  of  the 
group’s  attributes  rather  than  those  of  the  individuals  composing 
the  group.  Herein  lies  the  great  value  of  the  statistical  method. 
It  is,  however,  a descriptive  method  only  and  has  the  limitations 
as  a weapon  of  research  which  that  fact  implies. 

The  second  purpose  that  the  statistical  method  has  been  called 
upon  to  accomplish  is  the  prediction  of  the  individual  case  from  a 
precise  knowledge  of  the  group  or  mass.  This  involves  something 
really  additional  to  the  statistical  method  per  se,  namely,  the  mathe- 
matical theory  of  probability.  We  have  seen  that  this  side  of  the 
statistical  method  gives  only  a somewhat  sterile  kind  of  knowledge 
so  far  as  concerns  individuals,  namely,  a knowledge  of  betting  odds. 
The  theory  of  probability  grew  up  about  the  gaming  table,  not  in 
the  laboratory.  Its  place  in  the  methodology  of  science  is  not  an 
independent  one.  By  it  alone  one  cannot  discover  new  truths  about 
phenomena.  But  it  is  a highly  important  adjunct  to  other  modes 
of  research. 

Plainly,  however,  one  cannot  regard  statistical  knowledge  in 
general  as  a higher  kind  of  knowledge  than  that  derived  in  other 
ways.  Nor  is  the  statistical  method  to  become  the  dominant  or 
exclusive  method  of  science,  though  it  will  always  be  useful,  and 
in  many  fields  an  essential  method.  It  will  find  its  chief  useful- 
ness, first  in  its  sphere  of  furnishing  shorthand  description  of  groups, 
and  second  in  furnishing  a test  of  the  probable  reliability  of  con- 
clusions. 
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CHAPTER  II 


SOME  LANDMARKS  IN  THE  HISTORY  OF  VITAL 

STATISTICS 

In  the  earlier  volumes  of  the  Journal  of  the  Royal  Statistical 
Society — those  mines  of  curious  information — a favorite  form  of 
contribution  was  the  “ tabular  resume/’  which  presented  a series 
of  more  or  less  statistical  facts  on  a chronologic  base.  So  dis- 
tinguished a precedent  seems  to  justify  the  use  of  the  same 
method  to  furnish  a bird’s-eye  view  of  the  development  of  bio- 
statistics itself.  Consequently  the  table  which  follows  has  been 
prepared. 

TABULAR  REVIEW  OF  THE  HISTORY  OF  VITAL  STATISTICS 

This  “ tabular  resume”  attempts  to  set  forth  in  chronologic 
array  what  the  passage  of  time  has  shown  to  be  some  of  the  most 
important  landmarks  in  the  history  of  biostatistics.  To  disarm 
in  some  measure  criticisms,  which  from  the  standpoint  of  the  pro- 
fessional historian  would  otherwise  be  undoubtedly  merited,  it  may 
be  said,  first,  that  there  has  been  no  slightest  thought  of  encom- 
passing within  this  short  table  a complete  history  of  the  subject. 
Historic  completeness  and  the  tabular  form  of  presentation  do 
not  go  well  together.  The  object  of  the  present  table  is  much 
simpler.  It  is  to  get  before  the  student  the  briefest  conspectus  of 
the  time  relations  of  the  development  of  the  subject,  on  the  one 
hand,  and  of  the  personalities  concerned  in  a large  pathbreaking 
way  in  this  development,  on  the  other  hand.  The  precise  manner 
in  which  such  a purpose  will  be  carried  out  w ill  obviously  be  different 
for  each  person  who  attempts  it.  One  person’s  estimate  as  to  the 
relative  historic  significance  of  a particular  event  or  personality 
will  differ  from  another’s.  In  any  event,  it  seems  clear  that  any 
historic  review  of  vital  statistics  would  be  bound  to  contain  at 
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TABULAR  REVIEW  OF  SOME  OF  THE  IMPORTANT  EVENTS  IN  THE 

HISTORY  OF  VITAL  STATISTICS 


Year. 


Event. 


1532 

1539 

1608 

1657 

1662 

1666 

1669 

1693 

1713 

1718 


First  definitely  known  compilation  of 
weekly  bills  of  mortality  in  London. 

Beginning  of  official  registration  of  bap- 
tisms, marriages  and  deaths  in  France. 

Beginning  of  oldest  parish  register  in 
Sweden. 

Publication  of  De  Ratiociniis  in  Ludo 
Aleae,  the  first  printed  work  on  games 
of  chance. 

Publication  of  first  edition  of  “Natural 
and  Political  Observations  Mentioned 
in  a following  Index,  and  made  upon  the 
Bills  of  Mortality.” 

First  Census  of  Canada  (the  earliest  mod- 
ern census  of  population). 

Application  of  mathematical  theory  of 
probability  to  expectation  of  human  life. 

Publication  of  “Estimate  of  the  Degrees 
of  Mortality  of  Mankind,”  in  the  Philo- 
sophical Transactions  of  the  Royal  So- 
ciety. 

Publication  of  “Physico-theology;  or  a 
demonstration  of  the  Being  and  Attri- 
butes of  God  from  his  Works  of  Crea- 
tion.” 

Publication  of  the  “Doctrine  of  Chances.” 


1733 

1735 

1741 


1746 

1748 

1749 


Publication  of  Approximatio  ad  Summam 
Terminorum  Binomii  ( a + b)nin  Seriem 
expansi,  the  discovery  of  the  normal 
curve. 

Registration  of  vital  statistics  begun  in 
Norway. 

Publication  of  “Die  gottliche  Ordnung  in 
den  Veranderungen  des  menschlichen 
Geschlechts  aus  der  Geburt,  dem  Tode 
und  der  Fortpfianzung  desselben  er- 
wiesen,  etc.” 

Publication  of  the  first  French  tables  of 
mortality  under  the  title  “Essai,  sur  les 
probabilites  de  la  duree  de  la  vie  hu- 
maine.” 

Beginning  of  Swedish  official  vital  sta- 
tistics. 

First  complete  Census  of  Sweden. 


1753 

1769 

1790 

1795 


First  Census  of  population  in  Austria 
ordered. 

First  population  Census  of  Denmark  and 
Norway. 

First  federal  Census  of  the  United  States. 
First  Census  of  the  Netherlands. 


1707  Establishment  of  Danish-Norwegian 
Tabulating  Office. 

I79g  First  complete  Census  of  Spain. 


1801 

1801 

1805 

1810 

1812 

1812 


1812 


1815 

1815 

1816 
1818 
1818 


First  complete  Census  of  Great  Britain. 

First  complete  Census  of  France. 

Formation  of  first  statistical  state  office 
within  boundaries  of  German  Empire. 

First  complete  Census  of  Prussia. 

Publication  of  “Theorie  analytique  des 
probabilites.” 

Inauguration  of  civil  registration  of 
births,  marriages  and  deaths  in  the 
Netherlands. 

Publication  of  “Theoria  combinationis 
observationum  erroribus  minimis  ob- 
noxia”  (Least  squares). 

First  complete  Census  of  Norway. 

First  complete  Census  of  Saxony. 

First  complete  Census  of  Baden. 

First  complete  Census  of  Austria. 

First  complete  Census  of  Bavaria. 


Personality  concerned. 


Christiaan  Huygens 
(1629-1695). 

Capt.  John  Graunt,  Citi- 
zen of  London  ( 1620— 
1674). 


Christiaan  Huygens 
(1629-1695). 

Halley,  the  astronomer 
(1656-1742). 


Rev.  William  Derham 
(1657-1735).  * 


A.  DeMoivre  (1667-1754). 
A.  DeMoivre  (1667-1754). 


Johann  Peter  Sussmilch 
(1707-1767). 


Deparcieux. 


Pierre  Simon  Laplace 
(1749-1827). 


Karl  Friedrich  Gauss 
(1777-1855). 


Authority  for  record. 


Hull,  C.  H.,  Econ.  Writ,  of 
Sir  Wm.  Petty,  p.  lxxxi. 

Faure,  F.,  Hist.  Stat., 
p.  242. 

Arosonius,  E.,  Hist.  Stat., 
p.  537. 

Walker,  H.  M.,  Hist.  Stat. 

Meth.,  p.  7. 

Hull,  C.  H.,  Econ.  Writ,  of 
Sir  Wm.  Petty,  p.  315. 


Godfrey,  E.  H.,  Hist.  Stat. 
p.  179. 

Stuart,  C.  A.  V., Hist.  Stat., 
p.  430. 

Hull,  Loc.  cit.,  p.  lxxvii. 


Hull,  Loc.  cit.,  pp.  lxxvi 
and  lxxviii. 


Art.  DeMoivre,  Encyc 
Brit. 

Pearson,  K.,  Biometrika, 
xvi,  p.  402. 


Kiaer,  A.  N.,  Hist.  Stat. 
p.  447. 

Hull,  Loc.  cit.,  p.  lxxviii. 


Faure,  F.,  Loc.  cit.,  p.  265. 


Arosonius,  E.,  Hist.  Stat. 
p.  540. 

Rossiter,  W.  S.,  Cent.  Pop. 
Growth,  p.  2. 

Meyer,  R.,  Hist.  Stat.,  p. 
85. 

Jensen,  A.,  Hist.  Stat.,  p. 

201. 

Stuart,  C.  A.  V.,  Hist. 
Stat.,  p.  43. 

Jensen,  A.,  Loc.  cit.,  p.  201. 

Rossiter,  W.  S.,  Cent.  Pop. 

Growth,  p.  2. 

Rossiter,  W.  S.,  Loc.  cit. 
Rossiter,  W.  S.,  Loc.  cit. 
Wiirzburger,  E.,  Hist.  Stat., 
p.  3. 

Rossiter,  W.  S.,  Loc.  cit. 
Encyc.  Brit.  Art.,  Laplace. 

Stuart,  C.  A.  V.,  Hist.  Stat., 
p.  432. 

Encyc.  Brit.  Art.,  Gauss. 


Rossiter,  W.  S.,  Loc.  cit. 
Rossiter,  W.  S.,  Loc.  cit. 
Rossiter,  W.  S.,  Loc.  cit. 
Rossiter,  W.  S.,  Loc.  cit. 
Rossiter,  W.  S.,  Loc.  cit. 
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TABULAR  REVIEW  OF  SOME  OF  THE  IMPORTANT  EVENTS  IN  THE 
HISTORY  OF  VITAL  ST  ATI  STIC  S — Concluded 


Year. 


Event. 


Personality  concerned. 


Authority  for  record. 


1825 

1826 
1829 


Publication  of  “Memoire  sur  les  lois  des 
naissances  et  de  la  mortalite  a Brux- 
elles,” Quetelet’s  first  statistical  paper. 

Establishment  of  statistical  commission 
in  Belgium. 

First  official  Census  of  Belgium. 


Lambert  Adolph  Jacques 
Quetelet  (1796-1874). 


Lottin,  Quetelet,  p.  xx. 


Ed.  Smits. 
Ed.  Smits. 


Julin,  A.,  Hist.  Stat., 
126. 

Julin,  A.,  Hist.  Stat., 


128. 


P- 


P. 


1832 


1834 

1835 

1836 

1837 

1838 
1838 


Publication  of  “Recherches  sur  la  repro- 
duction et  sur  la  mortalite  de  l’homme 
aux  differents  ages  et  sur  la  population 
de  la  Belgique  d’apres  la  recensement  de 
1829  (premier  recueil  officiel  des  docu- 
ments statistiques).” 

Royal  Statistical  Society  (London) 
founded. 

Publication  of  “Sur  l’homme  et  le  devel- 
oppement  de  ses  facultes,  ou  Essai  de 
physique  sociale.” 

First  complete  Census  of  Greece. 

Civil  registration  of  vital  statistics  in 
England.  Establishment  of  office  of 
Registrar-General. 

Publication  of  “Essay  on  Probabilities” 
in  Lardner’s  Encyclopedia. 

Publication  of  first  paper  on  the  logistic 
curve  of  population  growth. 


Quetelet  and  Smits. 


Lambert  Adolph  Jacques 
Quetelet  (1796-1874). 


Augustus  DeMorgan 
(1806-1871). 

P.  F.  Verhulst. 


Lottin,  Loc.  cit.,  p.  xxi. 


Title  page  of  Journal 
Lottin,  Loc.  cit.,  p xxi. 


Rossiter,  W.  S.,  Loc.  cit. 
Baines,  A.,  Hist.  Stat.,  p. 
370. 

Encyc.  Brit.  Art., 
DeMorgan. 

Yule,  G.  U.,  Jour.  Roy. 
Stat.  Soc.,  vol.  88,  pp. 
1-58,  1925. 


1839 

1839 

1846 

1848 

1860 

1861 

1863 

1865 

1867 

1869 


Appointment  of  William  Farr  as  com- 
piler of  abstracts  in  the  Registrar- 
General’s  Office. 

Organization  of  American  Statistical  As- 
sociation. 

Publication  of  “Analyse  mathematique 
sur  les  probability  des  erreurs  de  situa- 
tion d’un  point.”  Acad,  des  Sci.  Mem. 
par  div.  sav.  He.  Ser.  t.  ix  (Correlation). 

Foundation  of  the  Institute  of  Actuaries 
of  Great  Britain  and  Ireland. 

First  complete  Census  of  Switzerland. 

First  complete  Census  of  Italy. 

Austria  establishes  Central  Statistical 
Commission. 

Publication  of  “History  of  Mathematical 
Theory  of  Probability  from  the  Time  of 
Pascal  to  that  of  Lagrange.” 

First  creation  of  independent  official  sta- 
tistical organization  in  Hungary. 

Publication  of  “Hereditary  Genius.” 


1869 

1872 

1881 


Foundation  of  Societe  de  statistique  de 
Paris. 

Opening  of  German  Imperial  Statistical 
Office. 

First  general  Census  of  India. 


William  Farr  (1807- 
1883). 


Auguste  Bravais  (i8n— 
i6(  3 :. 


Count  Mercandin. 

Isaac  Todhunter 
(1820-1884). 


Sir  Francis  Galton 
(1822-1907). 


Farr’s  Vit.  Stat.,  Edit. 
Humphrey. 

Hist,  of  Stat.,  p.  3. 

Yule,  Introd.,  p.  188. 


Encyc.  Brit.  Art., 
“Actuary.” 

Rossiter,  W.  S.,  Loc.  cit. 

Rossiter,  W.  S.,  Loc.  cit. 

Meyer,  R.,  Loc.  cit.,  p.  89. 

Encyc.  Brit.  Art.,  Tod- 
hunter. 

Buday,  L.  von.  Hist.  Stat., 
p.  395. 

Art.  Galton,  Encyc.  Brit. 

Title  page  of  Journal. 

Wiirzburger,  E.,  Hist.  Stat  , 
p.  337. 

Baines,  A.,  Hist.  Stat.,  p. 
421. 


1887 

1890 

1894 

1897 

1897 


Royal  Statistical  Society  incorporated  by 
Royal  Charter. 

First  Census  in  which  mechanical  meth- 
ods of  tabulation  were  used. 

Publication  of  first  of  “Contributions  to 
the  Mathematical  Theory  of  Evolution” 
in  Phil.  Trans.  Roy.  Soc. 

Publication  of  paper  “On  the  Theory  of 
Correlation”  in  the  Jour.  Roy.  Stat.  Soc. 

First  Census  of  Russia. 


John  S.  Billings  and  Her- 
man Hollerith. 

Karl  Pearson. 


G.  Udny  Yule. 


1900  First  year  of  separately  published  official 

mortality  statistics  for  Registration 
Area  of  United  States. 

1901  Publication  of  first  number  of  Biometrika. 


1902  Creation  of  permanent  Census  Bureau  in 
the  United  States. 

1915  First  year  of  separately  published  official 
birth  statistics  for  Registration  Area  of 
United  States. 


Francis  Galton,  Karl  Pear- 
son, W.  F.  R.  Weldon, 
C.  B.  Davenport. 


Title  page  of  Journal. 
Rept.  Supt.  Census  1889, 

p.  8. 

Title  page. 


Jour.  Roy.  Stat.  Soc.,  vol. 
lx,  p.  812. 

Kaufman,  A.,  Hist.  Stat., 
p.  481. 

Title  page  of  “Mortality 
Statistics.” 

Title  page. 


Cummings,  J.,  Hist.  Stat., 

p.  682. 

Title  page  of  “Birth  Sta- 
tistics.” 
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least  a good  many  of  the  items  of  the  present  table.  More  than 
this  in  the  way  of  agreement  among  scholars  on  a historic  matter 
it  is  doubtless  idle  to  hope  for. 

In  the  second  place  it  should  be  said  that  if  the  sources  chosen 
for  statement  of  reference  as  to  the  facts  are  obviously  in  some  cases 
second-hand,  and  perhaps  somewhat  casual,  this  is  so  of  deliberate 
purpose.  It  is  hoped  that  by  so  choosing  them  it  may  perchance 
be  possible  to  entice  an  unwary  student  or  so  to  do  a little  reading 
about  the  men  who  have  helped  to  develop  modern  statistics.  I am 
quite  sure  that  this  will  not  happen  if  he  is  referred  straight  off  to 
a ponderous  and  deadly  “Geschichte  der  Statistik.”  Nor  is  there 
much  chance  that  the  embryo  health-officer  or  medical  man  would 
make  anything  but  heavy  weather  if  he  essayed  a voyage  into  the 
“Theorie  analytique.”  But  if  he  will  read  the  article  in  the  En- 
cyclopedia Britannica  on  Laplace  he  will  tend  to  have  a measure 
of  wholesome  respect  for  a great  man,  and  will  know  a little  at 
least  of  what  that  man  meant  in  the  history  of  science. 

CAPTAIN  JOHN  GRAUNT 

Vital  statistics,  in  the  modern  sense  of  the  term,  may  be  said 
to  take  its  origin  from  the  publication,  in  1662,  of  a remarkable 
book  for  any  age,  but  particularly  so  for  that  time,  entitled,  Natural 
and  Political  Observations  Mentioned  in  a Following  Index , and 
Made  upon  the  Bills  of  Mortality , by  John  Graunt,  Citizen  of 
London  (1620-1674).  Bills  of  mortality,  consisting  of  lists  of 
burials,  marriages,  and  baptisms,  had  been  compiled  by  the  parish 
clerks  for  upward  of  a century  before  Graunt’s  time,  but  no  one 
before  him  had  conceived  the  idea  of  making  an  analytical  study 
of  these  observations  to  the  end  of  determining  the  basic  laws  of 
human  mortality,  natality,  and  movement  of  population.  From 
his  inadequate  and  meager  material,  as  measured  by  present 
standards,  Graunt  successfully  demonstrated  four  of  the  most 
important  facts  which  the  study  of  vital  statistics  to  this  day  has 
disclosed.  First,  he  made  clear  the  regularity  of  certain  vital 
phenomena  which  appear  to  be  merely  the  play  of  chance  in  their 
individual  occurrence.  Second,  he  first  pointed  out  the  excess  of 
male  over  female  births , and  the  approximately  equal  numbers  of 
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Natural  and  Political 

OBSERVATIONS 

Mentioned  in  a following  INDEX, 

and  made  upon  the 

Bills  of  Mortality. 


B Y 

Cap t.jOHN  G^AUNT, 

Fellow  of  the  Royal  Society. 


With  reference  to  the  Government}  7 [eli- 

gion,lrade,  Growth,  Air,  Difeafes , and  the 
fevcral  Change#  of  the  Odd  CITY. 

■ Non,  me  ut  miretur  Turba,  labor o9 
Contentm  faucis  LeQoribm 


The  Fourth  Impreflion. 


OXFORD, 

Printed  by  William  Hall,  for  John  Marty* $ 
and  James  Allejlry,  Printers  to  the 
Royal  Society,  M D C LX  V, 

Fig.  1. — Facsimile  (actual  size)  of  the  title-page  of  the  first  treatise  on  vital  st 
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the  sexes  in  the  population.  Third,  he  demonstrated  the  relatively 
high  rate  of  mortality  in  the  earliest  years  of  life,  and  finally  he  dis- 
covered that  the  urban  is  higher  than  the  rural  death-rate  normally. 

Besides  the  intrinsic  value  of  its  results,  Graunt’s  book  served 
for  many  years  as  the  stimulator  of  other  work  in  the  same  general 
field.  In  particular  it  is  probably  safe  to  conclude  that  Graunt’s 


Fig.  2. — Portrait  of  the  eminent  astronomer  and  mathematician,  Edmund  Halley 
(1656-1742),  who  was  the  first  person  to  construct  a life  table  on  sound  principles. 

book  was  the  inciting  agency  which  led  the  astronomers  and  mathe- 
maticians, Huygens  in  Holland  and  Halley  in  England,  to  take  up 
the  problems  of  determining  by  appropriate  mathematical  methods 
the  probable  expectation  of  human  life  at  any  given  age.  Halley 
constructed  the  first  really  significant  mortality  table.  Some  of 
his  results  are  shown  in  Fig.  3. 
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Fig.  3.— Survivorship  distribution  of  the  first  life  table  (Halley’s).  Repro- 
duced in  facsimile  from  Baddam’s  "Memoirs  of  the  Royal  Society,”  vol.  iii, 
p.  36.  This  table  "shews  the  number  of  persons  living  in  the  age  current  annexed 
thereto.” 

THE  MOST  ANCIENT  BILL  OF  MORTALITY 

The  earliest  known  bill  of  mortality  is  an  interesting  document. 
It  was  in  manuscript  form,  and  is  preserved  among  the  Egerton 
MSS.  at  the  British  Museum.  It  is  shown  in  facsimile  in  Fig.  4. 

Creighton9  believes  its  date  to  be  1532  (week  of  November  16th 
to  23d),  and  gives  evidence  for  his  belief  as  to  the  year  (Vol.  I, 
p.  295):  uThe  extant  bill  for  the  week  16th  to  23d  November  is 
clearly  one  of  a series;  there  are  no  good  grounds  for  assigning  it 
to  an  earlier  date  than  the  year  1532,  while  there  are  reasons  for 
not  placing  it  later.  There  are  two  other  plague-bills  extant,  for 
August,  1535,  written  out  in  a more  clerkly  fashion,  and  bearing 
the  marks  of  greater  experience.  The  bill  for  the  week  in  No- 
vember is  more  primitive  in  appearance;  and  we  may  fairly  take 
it  as  one  of  the  series  first  ordered  by  the  Council  in  1532:  for  that 
was  the  most  considerable  year  of  the  plague  immediately  preced- 
ing the  outburst  of  1535,  to  which  the  finished  bills  certainly  be- 
long.” This  earliest  of  official  reports  of  vital  statistics  to  be 
preserved  is  transcribed  by  Creighton  (retaining  the  original 
spelling)  as  follows: 
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Fig.  4. — Photographic  reproduction  of  the  earliest  known  bill  of  mortality:  A, 
obverse;  B,  reverse.  Reduced  to  about  one-half  actual  size.  (For  permission  to 
publish  the  photographic  reproduction  of  this  interesting  document  I am  obliged  to 
Sir  Frederick  Kenyon,  Director  of  the  British  Museum.  The  photographs  were  pro- 
cured for  me  by  Mrs.  Onera  A.  Merritt  Hawkes,  to  whom  I am  greatly  indebted  for 
this  service. — R.  P.) 
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Syns  the  xvith  day  of  November  unto  the  xxm  day  of  the  same  moneth  ys  dead 
within  the  cite  and  freedom  yong  and  old  these  many  folowyng  of  the  plage  and  other 
dyseases. 

Inpnmys  benetts  gracechurch  i of  the  plage 
S Buttolls  in  front  of  Bysshops  gate  icorse 
S Nycholas  flesshammls  i of  the  plage 
S Peturs  in  Cornhill  i of  the  plage 
Mary  Woolnerth  i corse 
AW  Halowes  Barkyng  ii  corses 
Kateryn  Colman  i of  the  plage 
Mary  Aldermanbury  i corse 
Michaels  in  Cornhill  iii  one  of  the  plage 
All  halows  the  Moor  ii  i of  the  plage 
S Gyliz  iiii  corses  iii  of  the  plage 
S Dunstons  in  the  West  iiii  of  the  plage 
Stevens  in  Colman  Strete  i corse 
All  halowys  Lumbert  Strete  i corse 
Martins  Owut  Whiche  i corse 
Margett  Moyses  i of  the  plage 
Kateryn  Creechurch  ii  of  the  plage 
Martyns  in  the  Vintre  ii  corses 
Buttolls  in  front  Algate  iiii  corses 
S Olavs  in  Hart  Strete  ii  corses 
S Andros  in  Holburn  ii  of  the  plage 
S Peters  at  Powls  Wharff  ii  of  the  plage 
S Fayths  i corse  of  the  plage 
S Alphes  i corse  of  the  plage 
S Mathows  in  Fryday  Strete  i of  the  plage 
Aldermary  ii  corses 
S Pulcres  iii  corses  i of  the  plage 
S Thomas  Appostells  ii  of  the  plage 
S Leonerds  Foster  Lane  i of  the  plage 
Michaels  in  the  Ryall  ii  corses 
S Albornes  i corse  of  the  plage 
Sywtthyns  ii  corses  of  the  plage 
Mary  Somersette  i corse 
S Bryde  v corses  i of  the  plage 
S Benetts  Powls  Wharff  i of  the  plage 
All  halows  in  the  Wall  i of  the  plage 
Mary  Hyll  i corse. 

Sum  of  the  plage  xxxiiii  persons 
Sum  of  other  seknes  xxxii  persons 

XX 

The  holl  sum  iii  & vi. 

XX 

And  there  is  this  weke  clere  iii  and  iii  paryshes  as  by  this  bille  doth  appeie 
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SUSSMILCH,  QUETELET,  AND  FARR 

The  next  considerable  contribution  to  vital  statistics,  as  such, 
was  the  publication  of  Die  gottliche  Ordnung  in  den  V eranderungen 
des  Menschlichen  Geschlechts  am  der  Geburt , dem  Tode  und  der 
Fortpflanzung  desselben  erwiesen,  etc.,  by  the  Reverend  Johann 
Peter  Siissmilch  (1707-1767).  Stissmilch  was  stimulated  by 
Graunt’s  Observations  to  apply  the  same  general  sort  of  method  to 
the  development  of  natural  theology.  This  book  exerted  a great 
influence  in  fields  other  than  theological,  and  was  the  logical  fore- 
runner of  the  great  work  of  the  famous  Belgian  vital  statistician, 
Lambert  Adolph  Jacques  Quetelet  (1796-1874),  entitled  Sur  Vhomme 
et  le  developpement  de  ses  facultes,  on  Essai  de  physique  so  dale, 
published  in  1835.  Quetelet  is  the  first  great  outstanding  figure 
in  the  development  of  modern  vital  statistics.  Trained  as  a mathe- 
matician, he  brought  to  bear  upon  the  data  of  human  vital  phenom- 
ena a more  adequate  methodology  than  had  before  been  applied. 

The  present-day  procedure  in  official  vital  statistics  undoubtedly 
owes  more  to  William  Farr  (1807-1883)  than  to  any  other  person. 
Besides  this  he  may  fairly  be  regarded  as  the  greatest  medical 
statistician  who  has  ever  lived.  Greenwood14  says:  “But  if 
ultimately  Graunt  had  a worthy  disciple  in  the  medical  profession, 
it  was  not  until  he  had  been  in  his  grave  more  than  a century.  He 
died  in  1674  and  William  Farr  was  born  in  1807. ” 

In  this  paper  just  quoted  Greenwood  gives  the  best  existing 
brief  estimate  of  the  significance  of  Farr  in  the  history  of  medicine, 
and  it  may  properly  be  reproduced  here  in  full.  He  says: 

“The  real  revolutionary  was  a licentiate  of  the  Society  of 
Apothecaries,  a (Mr.  Farr,  a gentleman  of  the  medical  profession,’ 
who  was  appointed  Compiler  of  Abstracts  in  the  General  Register 
Office  on  July  10,  1839.  Although  Mr.  Noel  Humphreys  earned 
the  gratitude  of  all  medical  men  by  his  collection  of  Farr’s  writings, 
published  in  1885,  a really  adequate  edition  of  Farr  has  yet  to  be 
produced.  We  sometimes  dream  of  such  an  edition;  we  picture  it 
with  an  introduction  by  Farr’s  worthy  successor,  Dr.  Thomas 
Stevenson,  and  with  footnotes  and  appendices  by  Dr.  John  Brown- 
lee. But  it  is  an  idle  dream;  governments  in  England,  so  the 
newspapers  tell  us,  often  spend  money  in  odd  ways,  but  at  least 
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Fig.  5. — Portrait  of  Lambert  Adolph  Jacques  Quetelet  (1796-1874) 


Fig.  6. — Portrait  of  Dr.  William  Farr  (1807-1883). 
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they  have  never  been  so  eccentric  as  to  waste  it  on  the  publication 
of  the  collected  works  of  great  Englishmen.  Farr  was  a very  great 
Englishman,  and  the  characteristics  of  his  genius  were  precisely 
those  which,  in  moments  of  self-esteem,  we  like  to  fancy  are  typ- 
ically English.  We  can  make  our  point  clear  by  contrasting  him 
with  two  great  men  who  were  at  their  prime  when  he  was  young, 
and  both  made  important  contributions  to  statistical  knowledge, 
Simeon  Poisson  and  George  Boole.  Poisson  wrote  a large  treatise 
upon  ostensibly  the  most  practical  of  subjects,  the  best  way  to 
secure  just  verdicts  in  courts  of  law;  Boole  dealt  with  the  very 
matter-of-fact  problem  of  numerical  approximation.  But  the  most 
superficial  reader  of  Poisson  or  of  Boole — not  that  their  works  are 
very  attractive  to  a hasty  reader — will  at  once  realize  that  the 
authors  are  far  more  interested  in  algebra  than  in  the  concrete 
applications  of  their  algebra.  Farr  has  left  many  pages  which, 
to  the  aforementioned  hasty  reader,  will  offer  almost  as  many 
algebraical  difficulties  as  even  Boole;  but  in  the  densest  forest  of 
symbols  Farr  never  loses  sight  of,  and  never  allows  his  companion  to 
lose  sight  of,  some  perfectly  definite  and  concrete  end  which  he 
proposes  to  reach. 

“No  branch  of  medical  or  vital  statistics  needs  for  its  cultivation 
a greater  variety  of  algebraical  tools  than  that  concerned  with  the 
production  of  complete  life  tables;  the  natural  faculty  which 
characterizes  the  born  mathematician  is  not,  indeed,  essential, 
but  skill  in  the  manipulation  of  symbols  is.  To  Farr  a life  table 
was — ■ 

‘An  instrument  of  investigation;  it  may  be  called  a biometer, 
for  it  gives  the  exact  measure  of  the  duration  of  life  under  given 
circumstances.  Such  a table  has  to  be  constructed  for  each  dis- 
trict and  for  each  profession,  to  determine  their  degree  of  salubrity. 
To  multiply  these  constructions,  then,  it  is  necessary  to  lay  down 
rules,  which,  while  they  involve  a minimum  amount  of  arithmetical 
labour,  will  yield  results  as  correct  as  can  be  obtained  in  the  present 
state  of  our  observations.’* 

“This  was  the  spirit  of  all  his  work.  He  faced  mathematical 

* From  a paper  contributed  to  the  Proceedings  of  the  Royal  Society  in  1859. 
(See  Farr’s  “Vital  Statistics,”  ed.  Humphreys,  London,  1885,  p.  492.) 
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difficulties  with  a courage  which  nothing  could  daunt- — it  takes 
some  courage  for  a self-taught  man  to  venture  upon  original  re- 
search within  the  province  of  the  oldest  of  the  sciences — when  they 
obstructed  his  progress  toward  a practical  end.  He  never  at- 
tempted to  compete  with  the  masters  of  pure  analysis  on  their  own 
ground.  We  have  been  the  gainers.  The  greatest  mathematical 
statisticians  of  the  first  half  of  the  nineteenth  century  were  not 
Englishmen;  we  have  not  to  our  credit  any  theoretical  work  of  that 
date  which  will  compare  with  the  researches  of  Laplace  and  of 
Poisson  in  France  or  of  Gauss  in  Germany;  but  of  no  civilized 
country  can  a record  of  fatal  disease  be  constructed  with  the  pre- 
cision which  appertains  to  the  medico-statistical  history  of  England 
and  Wales  since  1840. 

“The  practical  advantages  to  the  physician  and  the  sanitarian 
are  enormous.  Matters  which  our  great  grandparents  fiercely 
debated,  topics  respecting  which  only  a very  shrewd  and  experienced 
physician  of  1820  could  form  an  opinion,  are  now  within  the  compass 
of  a junior  medical  student.  If  Farr  had  been  born  a generation 
earlier  and  the  General  Register  Office  had  been  founded  in  1807 
instead  of  in  1837,  the  sanitary  history  of  our  manufacturing  towns 
might  have  been  different.  If  even  the  lessons  he  taught  year  by 
year  had  sunk  into  the  minds  of  all  members  of  our  profession, 
many  disappointments  would  have  been  spared  and  perhaps  some 
false  apprehensions  quieted.  The  curious  reader  of  old  blue-books 
will  find  much  of  interest  in  the  census  reports  of  Lamb’s  friend 
Rickman,  but  Rickman  was  not  a Farr.  Rickman,  for  instance 
(in  1831),  commented  upon  the  apparent  unhealthiness  of  the 
northern  manufacturing  districts,  but  he  could  not  speak  with  much 
authority,  for  his  basis  of  facts  was  no  more  than  an  abstract  of 
burial  and  baptismal  registers.  These  are  the  words  of  Farr  (from 
the  supplement  to  the  thirty-fifth  Annual  Report): 

‘Take  for  example  the  group  of  51  districts  called  healthy  for 
the  sake  of  distinction,  and  here  it  is  found  that  the  annual  mor- 
tality per  cent,  of  boys  under  five  years  of  age  was  4.246;  of  girls, 
3.501.  Turn  to  the  district  of  Liverpool,  the  mortality  of  boys 
was  14.475;  of  girls,  13.429.  Here  it  is  evident  that  some  pregnant 
exceptional  causes  of  death  are  in  operation  in  this  second  city  of 
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England.  What  are  these  causes?  Do  they  admit  of  removal? 
If  they  do  admit  of  removal,  is  this  destruction  of  life  to  be  allowed 
to  go  on  indefinitely?  It  is  found  that  of  10,000  children  born 
alive  in  Liverpool  5396  live  five  years,  a number  that  in  the  healthy 
districts  could  be  provided  by  6544  annual  births.’ 

“The  ‘dear  old  doctor’- — as  Mr.  Humphreys  called  him— could 
round  a period  in  the  early  Victorian  style  with  the  best;  the  classical 
quotations  in  his  reports  might  have  tempted  William  Pitt  or 
Charles  Fox  to  become  statisticians;  but  he  could  also  use  very 
plain  English  indeed.  Statistics  with  plain  English  as  a propellent 
are  formidable  missiles. 

“We  could  fill  many  columns  with  examples,  but  we  must  take 
leave  of  the  greatest  of  medical  statisticians  with  one  observation. 
Farr’s  work  has  on  it  the  seal  of  all  supreme  achievements;  it  is 
indestructible.  It  was,  of  course,  a piece  of  good  luck  that  his 
three  successors,  the  late  Dr.  William  Ogle,  Dr.  John  Tatham,  and 
Dr.  Thomas  Stevenson,  were  men  having  the  same  ideals  and  zealous 
to  build  higher  upon  his  foundations.  The  nation,  we  hope,  will 
always  be  fortunate  enough  to  secure  equally  worthy  spiritual 
descendants  of  the  founder.  But  no  weakness  of  human  instru- 
ments or  credible  deteriorations  of  the  system  could  ever  take  from 
the  General  Register  Office  the  power  of  Tendering  immense 
service  to  sanitary  science  by  enabling  it  to  use  exact  numerical 
standards  in  place  of  the  former  vague  adjectives.’*  So  far  as 
records  of  mortality  are  concerned,  the  real  reformer  is  one  who 
treads  accurately  in  the  footprints  of  William  Farr.” 

THE  HISTORY  OF  BIOMETRY 

The  application  of  statistical  methods  to  the  study  of  bio- 
logic problems  other  than  those  of  anthropology,  and  of  vital 
statistics  in  the  narrower  sense,  may  be  said  to  have  begun  with 
the  work  of  the  late  Sir  Francis  Galton.  Galton  was  a born  statis- 
tician. He  tells  in  his  Memories 13  of  the  instinct,  which  he  inherited 
from  his  father,  to  arrange,  classify,  and  collect  statistics  about  all 
sorts  of  things.  At  the  same  time  he  was  deeply  interested  in 
problems  of  biology,  particularly  those  having  to  do  with  inher- 

* Simon:  English  Sanitary  Institution,  p.  212. 
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itance.  His  interest  in  this  direction  crystallized  into  definite 
activity  at  about  the  time  that  his  cousin,  Charles  Darwin,  was 
elaborating  his  theory  of  heredity,  which  was  called  pangenesis. 
Galton  instantly  realized  that  this  conception  of  the  physiology 
of  the  hereditary  process  was  essentially  statistical  in  character, 
and  that  statistical  methods  were  demanded  to  test  and  broaden 
it.  Upon  this  work  he  therefore  embarked  with  the  vigor  and 


Fig.  7. — Portrait  of  Francis  Galton  (1822-1907).  (For  permission  to  publish  this 
portrait  here  I am  indebted  to  Dr.  G.  H.  Shull,  Editor  of  Genetics.) 

ardent  enthusiasm  which  characterized  all  of  his  scientific  work. 
His  results  found  expression  in  a series  of  memoirs  and  books  which 
have  become  classics  in  biologic  science.  Of  these  the  most 
important  is  perhaps  Natural  Inheritance , since  in  it  are  brought 
to  a focus  a number  of  different  lines  of  work  which  engaged  Gal- 
ton’s  thought  and  energy  for  many  years.  In  this  book  the  attempt 
is  made  for  the  first  time  to  determine,  on  a statistical  basis,  the 
degree  of  resemblance,  in  respect  of  bodily,  mental,  and  tern- 
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peramental  traits,  which  obtains  between  relatives  of  different 
degrees.  Previously  no  attempt  had  been  made  to  measure 
precisely  these  resemblances,  which  were,  of  course,  a matter  of 
common  observation,  though  not  of  precise  definition,  to  everyone. 

In  order  to  make  the  desired  analysis  of  this  problem  it  was 
necessary  for  Galton  to  devise  new  methods  of  dealing  with  statis- 
tics. The  genera]  mathematical  foundations  of  statistical  science 
had,  to  be  sure,  been  laid  by  the  mathematicians  Laplace  and 


Fig.  8. — Portrait  of  Pierre  Simon  Laplace  (1749-1827). 


Gauss,  and  some  progress  in  the  application  of  these  methods  had 
been  made  by  Quetelet.  But  none  of  these  men  had  dealt  specifically 
with  the  measurement  of  what  are  now  known  as  correlated  varia- 
tions. From  Galton’s  point  of  viewing  the  problem  of  heredity 
such  a measure  was  an  absolute  necessity.  He,  therefore,  devised 
one.  It  was  not  altogether  a perfect  one,  but  was  practically  usable, 
and  led  very  shortly  to  developments  which  furnished  the  entirely 
adequate  measure  which  he  had  sought. 
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To  the  end  of  his  life  Sir  Francis  Galton  retained  his  interest  in 
the  science  of  biometry,  of  which  he  may  truly  be  said  to  have  been 
the  founder.  His  keenness  of  interest  served  in  great  part  as  the 
primal  inspiration  and  stimulus  which  led  two  other  distinguished 
English  workers  to  enter  this  field  and  begin  to  rear  the  super- 


Fig.  9. — Portrait  of  Karl  Pearson,  F.  R.  S. 


structure  on  the  foundation  already  laid.  These  were  Professor 
Karl  Pearson  of  University  College  and  the  late  Professor  W.  F.  R. 
Weldon.  To  Professor  Pearson  belongs  the  very  great  credit  of 
developing  adequate  and  general  mathematical  methods  for  the 
analysis  of  biologic  statistics.  Statistical  mathematics  in  the  main 
fall  within  the  realm  of  the  calculus  of  probability.  The  founda- 
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tions  of  that  calculus  were  laid  by  Laplace  and  Gauss,  as  has  already 
been  pointed  out.  Since  their  day  the  most  notable  fundamental 
advance  in  the  mathematical  theory  of  probability  has,  in  the 
writer’s  judgment,  been  due  to  the  genius  of  Karl  Pearson.  Start- 
ing from  the  sound  position  that  the  facts  of  nature  are  of  more 
importance  than  any  theory,  Pearson  in  three  classic  memoirs, 
in  his  series  of  Mathematical  Contributions  to  the  Theory  of  Evolu- 


Fig.  10.- — Portrait  of  G.  Udny  Yule,  F.  R.  S.  (Photo:  Russell.) 

tion}  developed  a theory  of  skew  frequency  curves,  and  skew  corre- 
lation, which  took  due  account  of  the  asymmetry  so  frequently  seen 
in  chance-determined  phenomena.  This  system  of  skew  frequency 
curves  has  now  had  the  test  of  more  than  twenty-five  years’  usage. 
Every  attempt  at  destructive  criticism  which  has  been  made 
against  it  has  failed.  None  of  the  substitutes,  some  of  which  have 
been  proposed  by  eminent  mathematicians,  has  shown  any  ap- 
proach to  the  generality  and  elegance  of  these  curves. 
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Few  biologists  have  an  adequate  conception  of  the  extent  to 
which  biometry  is  indebted  to  Professor  Karl  Pearson.  If,  as  has 
been  maintained,  every  real  advance  in  science  depends  upon  the 
discovery  and  perfection  of  a new  technic,  then,  for  whatever 
advance  in  biology  may  come  through  biometry,  the  debt  to  that 
distinguished  investigator  will  be  large  for  many  years  to  come. 

The  English  may  perhaps  fairly  be  said  to  have  led  the  world  in 
the  development  of  modern  statistical  theory  and  practice.  In 


Fig.  11.— Portrait  of  Major  Greenwood,  F.  R.  S. 


addition  to  the  achievements  of  Graunt,  Halley,  Farr,  Galton, 
Weldon  and  Pearson,  who  have  been  discussed  in  this  chapter, 
some  mention,  at  least,  must  be  made  of  a number  of  other  English 
workers,  who  have  made  fundamental  contributions,  notably 
De  Moivre,  F.  Y.  Edgeworth,18  W.  F.  Sheppard,  G.  Udny  Yule, 
L.  Isserlis,  and  H.  E.  Soper.  In  the  application  of  biometric  meth- 
ods to  specifically  medical  problems,  English  workers,  notably 
Prof.  Major  Greenwood  of  the  London  School  of  Hygiene  and 
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Tropical  Medicine,  from  whose  work  we  have  already  quoted,  and 
the  late  Dr.  John  Brownlee  have  taken  a leading  part.  These 
workers  and  their  associates  have  made  notable  contributions  to 
the  understanding  of  some  of  the  most  difficult  problems  of  etiology 
and  epidemiology. 

Aside  from  the  English  perhaps  the  most  outstanding  school  in 
the  development  of  statistical  theory  and  practice  has  been  the 
Scandinavian.  Here  the  important  names  are  those  of  J.  P.  Gram, 
T.  N.  Thiele,  C.  V.  L.  Charlier,  S.  D.  Wicksell,  and  Arne  Fisher, 
who  has  for  many  years  made  his  home  in  America.  The  Scan- 
dinavian school  is  chiefly  noted  for  a system  of  skew  curves  based 
upon  the  semi-invariants  of  Thiele. 
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CHAPTER  III 


THE  RAW  DATA  OF  BIOSTATISTICS 

Broadly  there  are  three  ways  in  which  statistical  data  are 
accumulated  in  the  realm  of  human  biology.  These  are: 

1.  The  census  method. 

2.  The  registration  method. 

3.  The  ad  hoc  or  case  record  method. 

Of  these  the  first  two  are  the  methods  of  official  vital  statistics, 
while  the  third  is  par  excellence  the  method  of  medicine  and 
biometry. 

In  the  present  chapter  we  shall  discuss  some  aspects  of  the  first 
two  methods,  while  in  Chapter  V a more  detailed  discussion  of  the 
third  method  will  be  undertaken. 

THE  CENSUS  METHOD  . 

Theoretically  a census  is  a count,  made  at  a single  specified 
instant  of  time,  of  a population  in  respect  of  certain  attributes  of 
the  persons  composing  the  population,  or  of  things.  Practically, 
of  course,  the  “instant  of  time”  is  rather  stretched  out,  but  the 
endeavor  is  always  made,  and  with  a fair  degree  of  success,  to  have 
the  information  gleaned  referable  to  a single  day. 

All  living  things  and  all  their  affairs  and  concerns  and  attributes 
are  continually  changing  with  greater  or  less  degrees  of  rapidity. 
The  living  world,  in  short,  is  in  a state  of  continuous  flux.  It  may 
be  thought  of  as  a vast  stream,  constantly  added  to  by  births,  and 
subtracted  from  by  deaths,  diverted  (but  only  slowly)  from  its 
previous  pathway  by  divers  impinging  forces,  but  always  and 
above  all,  moving,  flowing. 

Now  a census  attempts  to  acquire  knowledge  of  the  composition 
and  characteristics  of  this  great  stream  by  examining  carefully,  at 
regular  intervals  of  time  (usually  ten  years  apart),  an  instantaneous 
cross-section  oj  it.  What  happened  before  the  cross-section  was 
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taken,  or  what  will  happen  after  it  is  taken,  can  only  be  inferred, 
when  the  census  method  of  acquiring  statistical  information  is  em- 
ployed, from  the  characteristics  of  the  cross-section  itself. 

Censuses  are  taken  either  ( a ) by  enumerators,  (b)  by  question- 
naires filled  up  by  the  victims  themselves,  or  (c)  by  the  two  means 
in  combination.  The  first  method  is  the  one  chiefly  employed  in 
the  United  States.  A person  visits  every  household  in  a limited 
area  on  or  near  census  day,  and  by  personal  inquiry  elicits  the 
desired  information.  The  second  method  is  the  one  chiefly  em- 
ployed in  England,  where  there  is  placed  in  the  hands  of  each 
householder  a little  time  before  census  day  a questionary  form 
which  he  must  truthfully  and  promptly  fill  in,  under  rather  heavy 
penalty  of  the  law  for  failure. 

The  data  of  value  in  biostatistics  for  which  dependence  is 
chiefly  put  on  the  census  method  at  the  present  time  are  those 
relating  to  the  living  population,  its  numbers,  age,  sex,  occupation, 
race,  etc. 

The  path  of  census  taking,  while  theoretically  straightforward, 
is  actually  beset  with  difficulties  and  hazards,  both  general  and 
specially  technical.  In  the  first  place,  there  is  an  ancient  and  per- 
sistent opposition  on  the  part  of  the  people  at  large  to  being  counted 
by  the  government.  Ignorance  and  superstition  combine  to  create 
antipathy  to  censuses.  Who  knows  whether  the  government  is 
not  using  this  opportunity,  by  some  subtle  and  diabolical  machina- 
tions, to  snoop  about  and  pry  out  some  information  regarding  the 
individual,  or  his  business,  or  his  wealth,  or  his  love  affairs,  which 
may  later  be  used  to  bring  about  his  discomfiture?  That  this 
feeling,  of  which  there  is  evidence  in  the  most  ancient  historical 
records,  persists  to  the  present  day  is  clearly  manifest  in  the  proc- 
lamation issued  by  President  Herbert  Hoover,  on  November  22, 
1929,  calling  for  the  regular  census  of  the  United  States  in  1930. 
This  proclamation  reads  as  follows: 

“ By  the  President  of  the  United  States  of  America. 

“A  PROCLAMATION. 

“Whereas,  by  the  Act  of  Congress  approved  June  18,  1929,  the  fifteenth 
decennial  census  of  the  United  States  is  to  be  taken  beginning  on  the  first  day 
of  April,  nineteen  hundred  and  thirty;  and 
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“Whereas,  A correct  enumeration  of  the  population  every  ten  years  is  required 
by  the  Constitution  of  the  United  States  for  the  purpose  of  determining  the  repre- 
sentation of  the  several  States  in  the  House  of  Representatives;  and 

“Whereas,  It  is  of  the  utmost  importance  to  the  interest  of  all  the  people  of 
the  United  States  that  this  census  should  be  a complete  and  accurate  report  of 
the  population  and  resources  of  the  nation; 

“Now,  therefore,  I,  Herbert  Hoover,  President  of  the  United  States  of  Amer- 
ica, do  hereby  declare  and  make  known  that,  under  the  law  aforesaid,  it  is  the 
duty  of  every  person  to  answer  all  questions  on  the  census  schedules  applying 
to  him  and  the  family  to  which  he  belongs  and  to  the  farm  occupied  by  him  or 
his  family,  and  all  other  census  schedules  as  required  by  law,  and  that  every 
person  refusing  to  do  so  is  subject  to  penalty. 

“The  sole  purpose  of  the  census  is  to  secure  general  statistical  information 
regarding  the  population  and  resources  of  the  country,  and  replies  are  required 
from  individuals  only  to  permit  the  compilation  of  such  general  statistics.  No 
person  can  be  harmed  in  any  way  by  furnishing  the  information  required.  The 
census  has  nothing  to  do  with  taxation,  with  military  or  jury  service,  with  the 
compulsion  of  school  attendance,  with  the  regulation  of  immigration  or  with  the 
enforcement  of  any  national,  state,  or  local  law  or  ordinance. 

“There  need  be  no  fear  that  any  disclosure  will  be  made  regarding  any 
individual  or  his  affairs.  For  the  due  protection  of  the  rights  and  interests  of 
the  persons  furnishing  information  every  employee  of  the  Census  Bureau  is 
prohibited,  under  heavy  penalty,  from  disclosing  any  information  which  may 
thus  come  to  his  knowledge. 

“I,  therefore,  earnestly  urge  upon  all  persons  to  answer  promptly,  com- 
pletely and  accurately  all  inquiries  addressed  to  them  by  the  enumerators  or 
other  employees  of  the  Census  Bureau,  and  thereby  contribute  their  share  toward 
making  this  great  and  necessary  public  undertaking  a success. 

“In  witness  whereof,  I have  hereunto  set  my  hand  and  caused  to  be  affixed 
the  Great  Seal  of  the  United  States. 

“Done  at  the  city  of  Washington,  this  22d  day  of  November,  in  the  year  of 
our  Lord  one  thousand  nine  hundred  and  twenty-nine,  and  of  the  independence 
of  the  United  States,  the  one  hundred  and  fifty-fourth. 

“Herbert  Hoover, 

“President  of  the  United  States, 

“By  the  President, 

“Henry  L.  Stimson, 

“Secretary  of  State.” 

The  questions  asked  by  the  enumerator  in  the  1930  census, 
regarding  which  the  proclamation  is  at  such  pains  to  reassure  the 
people,  covered  the  following  points: 

1.  Relationship  to  head  of  family,  including  a statement  as  to  the  home-maker 

in  each  fam’ly. 

2.  Whether  the  home  is  owned  or  rented. 

3.  Value  of  home,  if  owned,  or  monthly  rental,  if  rented. 

4.  Radio  set?  (“Yes”  or  “No.”) 
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5.  Does  this  family  live  on  a farm?  (“Yes”  or  “No.”) 

6.  Sex. 

7.  Color  or  race. 

8.  Age  at  last  birthday. 

9.  Marital  condition. 

10.  Age  at  first  marriage.  (For  married  persons  only.) 

11.  Attended  school  or  college  any  time  since  September  1,  1929?  (“Yes”  or 

“No.”) 

12.  Whether  able  to  read  or  write?  (“Yes”  or  “No.”) 

13.  Place  of  birth  of  person.  (State  or  country.) 

14.  Place  of  birth  of  person’s  father.  (State  or  country.) 

15.  Place  of  birth  of  person’s  mother.  (State  or  country.) 

16.  Mother  tongue  of  each  foreign-born  person. 

17.  Year  of  immigration  to  the  United  States.  (For  foreign  born  only.) 

18.  WLether  naturalized.  (For  foreign  born  only.) 

19.  Whether  able  to  speak  English.  (For  foreign  born  only.) 

20.  Occupation  of  each  gainful  worker. 

21.  Industry  in  which  employed. 

22.  Whether  employer,  employee,  or  working  on  own  account. 

23.  Whether  actually  at  work.  (For  each  person  usually  employed  but  returned 

as  not  at  work,  additional  information  will  be  secured  on  a special  unem- 
ployment schedule.) 

24.  Whether  a veteran  of  the  United  States  military  or  naval  forces;  and  for  each 

veteran,  in  what  war  or  expedition  he  served. 

Several  of  these  questions  were  new  in  United  States  Census 
practice.  Among  the  most  important  of  these  new  questions  is 
that  calling  for  the  value  of  the  home  if  owned,  or  the  monthly 
rental  if  rented.  This  makes  possible  a classification  of  families 
according  to  economic  status,  or,  it  is  perhaps  hoped,  according  to 
buying  power.  Such  a classification  was  urgently  desired  by 
individuals  and  firms  using  the  census  figures  as  a basis  for  organiz- 
ing their  selling  and  advertising  campaigns  and  will  doubtless  serve 
many  other  purposes.  It  was  promised  that  the  replies  to  these 
questions  would  be  used  only  as  a basis  for  classification  of  the 
families  into  broad  groups. 

Another  new  question  is  that  which  asks  for  the  age  at  first 
marriage.  This  serves  two  purposes.  In  the  first  place  it  gives 
definite  information  as  to  the  relative  age  at  marriage  of  persons 
in  different  racial  and  economic  groups.  In  the  second  place  it 
makes  possible  a tabulation  of  important  data  on  the  size  of  families, 
such  tabulation  to  be  based  on  the  number  of  children  reported  in 
the  families  of  women  who  have  been  married  a number  of  years. 
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The  data  so  obtained  should  be  of  great  value  for  the  study  of 
differential  fertility  and  other  population  problems. 

In  the  classification  of  gainful  workers  according  to  occupation 
and  industry  much  greater  stress  than  heretofore  was  put  on  the 
returns  for  industry.  The  enumerators  were  instructed  to  pay 
special  attention  to  this  section  of  the  schedule. 

Women  doing  housework  in  their  own  homes  (or  supervising 
such  work  done  by  servants)  and  carrying  the  other  responsi- 
bilities of  the  home  were  designated  as  home-makers.  This  desig- 
nation was  entered  in  the  family  relationship  column  of  the  schedule, 
rather  than  in  the  occupation  column,  in  order  that  those  women 
who  follow  a profession  or  other  gainful  occupation,  in  addition  to 
being  home-makers,  might  be  properly  classified  in  respect  to  both 
lines  of  activity. 

A special  schedule  for  unemployment  contained  a number  of 
questions  designed  to  separate  those  not  working  into  several  classes, 
including,  besides  those  absolutely  unemployed,  those  who  had  a 
job  but  were  for  the  time  being  on  lay-off  without  pay,  etc. 

In  the  classification  by  color  or  race  a special  group  was  pro- 
vided for  Mexicans,  in  which  were  placed  all  persons  of  Mexican 
origin  except  those  of  strictly  white  ancestry,  who  were  counted 
as  heretofore  with  the  whites,  and  possibly  a small  number  who 
were  classified  as  Indians. 

Provision  was  again  made  for  classifying  the  foreign  born, 
which  still  form  a very  important  element  in  the  population,  in 
five  different  ways:  namely,  by  country  of  birth;  by  mother  tongue 
(which  is  sometimes  a better  index  of  nationality  than  is  country 
of  birth);  by  year  of  immigration  to  the  United  States;  by  citizen- 
ship (that  is,  whether  naturalized,  having  first  papers,  or  alien); 
and  by  ability  to  speak  English. 

There  are  many  technical  difficulties  in  getting  completely 
accurate  results  in  census  taking.  Indeed  only  an  approximation 
is  ever  obtained.  Such  approximation  is  probably  closest  in  respect 
of  the  total  number  of  the  population,  and  is  less  good,  in  varying 
degrees,  relative  to  such  matters  as  age,  occupation,  national 
origin,  etc.  It  is  impossible  to  go  here  into  detail  regarding  all 
the  difficulties  of  the  census  method.  One  only  may  be  discussed 
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somewhat  fully,  because  it  involves,  next  to  the  bare  count,  the 
most  important  datum  for  which  the  vital  statistician  is  dependent 
upon  the  census.  This  is  the  age  distribution  of  the  living  pop- 
ulation. 

It  has  been  the  universal  experience  in  census  work  that  there 
are  two  outstanding  errors  in  census  results  respecting  age.  One 
is  that  the  number  of  infants  under  one  year  of  age,  and  between 
one  and  two  years  of  age,  is  always  understated  in  the  census 
returns.  It  is  certain  that  there  are  always  more  living  infants  of 
these  ages  than  the  census  count  shows.  The  second  error  is  that 
the  return  always  shows  an  excess  of  persons  at  certain  particular 
years  of  age.  This  concentration  is  most  marked  on  ages  which 
are  multiples  of  5 and  10,  but  there  is  also  observable  a tendency 
in  lesser  degree  to  concentrate  on  even  ages,  as  contrasted  with 
odd,  among  persons  less  than  twenty  years  of  age.  These  concen- 
trations are  regarded  as  due  to  a well-known  trait  of  human  nature 
to  state  ages  in  round  numbers,  especially  in  cases  where  the 
enumerator  is  obliged  to  get  his  information  second  hand  because 
the  person  concerned  is  away  from  the  domicile  when  the  call  is 
made,  and  the  person  who  does  the  answering  literally  has  no 
exact  knowledge  of  the  absent  one’s  age. 

An  excellent  discussion  of  this  matter  may  be  quoted  from 
Population , Part  II  of  Census  Reports,  Volume  II,  of  the  Twelfth 
Census  of  the  United  States  (1900),  p.  xxxv. 

“Evidences  of  concentration  were  noticeable  in  the  census  returns  of  1890, 
in  spite  of  the  fact  that  in  the  printed  instructions  at  that  census  the  attention 
of  the  enumerators  was  directed  specially  to  these  inaccuracies  in  the  return  of 
ages,  and  that  they  were  cautioned  not  to  accept  such  indefinite  statements  with- 
out first  endeavoring  to  secure  the  exact  year  of  age.  This  specific  instruction 
had  some  effect,  apparently,  in  lessening  the  extent  to  which  ages  were  given  in 
round  numbers;  but  it  is  evident,  as  stated  in  the  report  for  1890,  that  ‘no  matter 
how  specific  the  instructions  to  the  enumerators  on  this  point  may  be,  the  natural 
tendency  is,  and  probably  always  will  be,  to  give  the  nearest  five  or  ten  year 
period,  especially  where  definite  information  is  not  at  hand.’  It  was  further 
suggested  in  the  same  report  that  probably  this  tendency  could  be  obviated  in 
part  ‘by  requiring  the  return  of  ages,  so  far  as  possible,  by  the  exact  day,  month, 
and  year  of  birth  and  allowing  a return  of  the  age  by  the  approximate  year  in 
only  those  cases  where  it  is  manifestly  impossible  to  ascertain  the  date  of  birth, 
on  the  assumption  that  a fairly  good  approximation  is  better  than  no  return 
at  all.’  ” 
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As  a means  of  determining  by  a practical  test  whether  or  not 
it  was  possible,  under  existing  methods  of  census  enumeration,  to 
obtain  a better  age  distribution  of  the  population,  an  effort  was 
made  at  the  1900  census  to  secure,  wherever  possible,  a return  of 
ages  with  a statement  of  the  month  and  year  of  birth.  As  was  to 
be  expected,  this  did  not  prevent  the  return  of  an  abnormally  large 
proportion  of  persons  as  of  the  ages  thirty,  thirty-five,  forty, 
forty-five,  fifty,  etc.,  but  a comparison  of  the  figures  for  1900 
with  those  for  the  two  preceding  enumerations  shows  that  the  more 
exact  inquiry  with  respect  to  age  in  1900  reduced  materially  the 
concentration  at  these  ages. 

Table  1 (page  70),  a portion  of  one  of  the  tables  of  the  1900 
census  (Table  XX  of  the  volume  cited),  shows  this  phenomenon  of 
concentration  in  percentage  form  for  native  whites,  foreign  born 
whites,  and  colored,  and  demonstrates  the  improvement  over  the 
three  censuses,  1880,  1890,  and  1900.  This  table  gives  the  per- 
centage which  the  excess  at  each  designated  age  (over  the  immedi- 
ately preceding  year  of  age)  is  of  the  number  in  100,000  of  the 
population  at  the  designated  age. 

The  conclusions  to  be  drawn  from  Table  1 are  first,  that  at  the 
census  of  1890  the  most  striking  concentration  in  the  earlier  period 
of  life  was  on  two  years  of  age,  the  excess  on  that  age  representing 
more  than  one-third  of  the  relative  number  of  males  and  females 
respectively,  for  each  element  of  the  population  considered.  The 
great  concentration  on  this  particular  age  was  undoubtedly  due  to 
the  form  of  inquiry  in  1890,  the  schedule  used  at  that  census  calling 
for  a return  of  “age  at  nearest  birthday”  instead  of  “age  at  last 
birthday,”  as  at  earlier  censuses  and  at  the  1900  census.  In  1900 
the  concentration  on  ages  is  very  slight,  and  the  improvement  in 
the  return  of  ages  at  that  census  is,  generally  speaking,  apparent 
throughout  the  table,  especially  for  the  earlier  periods  of  life. 

For  both  native  white  and  foreign  white  persons  the  concentration 
in  1900  on  the  five-year  periods  after  twenty-five  years,  although  con- 
siderable, is  very  much  less  than  at  the  preceding  census.  For  col- 
ored persons,  however,  the  improvement  in  this  respect  is  not  so 
marked,  the  percentages  of  excess  in  1900  being  large,  although 
somewhat  less,  in  each  case,  than  those  shown  for  1880  and  1890. 
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TABLE  1 

Excess  of  Native  White,  Foreign  White,  and  Colored  Males  and  Fe- 
males, Respectively,  Reported  for  Certain  Years  of  Age  in  the 
United  States  Censuses  of  1880,  1890,  and  1900  1 

Percentages 


Per  cent,  of  excess  of  number  in  100,000,  for  each  specified  age. 


Sex  and  ages. 

Native  white. 

Foreign  white. 

Colored.2 

1900 

1890 

1880 

1900 

1890 

1880 

1900 

1890 

1880 

MALES. 

2 years 

2.6 

37.8 

11.3 

7.3 

34.3 

15.7 

4 years 

0.1 

1.6 

1.9 

5.7 

4.1 

6 years 

0.8 

2.5 

0.1 

2.2 

8.1 

4.0 

8 years 

0.2 

1.9 

7.6 

4.3 

10  years 

2.6 

6.7 

7.1 

10.4 

20.2 

20.6 

12  years 

0.2 

12.5 

11.5 

18.6 

31.6 

33.2 

14  years 

0.7 

7.3 

1.7 

5.3 

4.7 

16  years 

0.4 

5.2 

3.8 

1.1 

3.0 

18  years 

, , 

4.6 

10.7 

19.5 

18.6 

14 . 8 

10.2 

16.0 

26.  i 

20  years 

0.1 

17.1 

22.8 

17.1 

13.5 

19.7 

21  years 

0.1 

13.8 

4.1 

1.1 

1.3 

4.6 

25  years 

10.2 

11.0 

18.3 

13.2 

16.6 

29.7 

30  years 

15.1 

26.7 

28.9 

32.4 

41.0 

50.8 

45.1 

54.0 

69.4 

35  years 

3.8 

16.2 

23.1 

20.5 

31.5 

44.9 

39.8 

55.9 

69.8 

40  years 

10.2 

28.0 

31.5 

29.6 

50.3 

60.4 

49.1 

58.2 

76.2 

45  years 

9.1 

27.6 

25.7 

23.3 

44.1 

53.3 

52.2 

66.7 

76.2 

50  years 

14.8 

31.9 

32.9 

33.0 

51.4 

61.3 

57.3 

65.0 

79.5 

55  years 

4.3 

8.5 

14.5 

19.9 

27.2 

37.1 

43.3 

58.6 

60  years 

12.3 

36.3 

33.8 

33.7 

57.4 

61.9 

68.5 

78.0 

86.8 

65  years 

— 

13.0 

13.4 

17.9 

24.6 

35.2 

58.0 

65.6 

74.2 

70  years . .*. 

4.2 

29.0 

16.2 

25.1 

45.6 

47.2 

65.1 

73.1 

82.9 

75  years 

6.6 

20.0 

24.5 

58.6 

65.1 

74.5 

80  years 

17.4 

4.7 

8.9 

36.2 

43.6 

66.3 

75.7 

83.5 

85  years 

47.8 

50.0 

60.0 

90  years 

16.7 

35.7 

30.0 

62.5 

70.6 

82.4 

FEMALES. 

2 years 

2.8 

38.1 

11.0 

6.0 

34.2 

14.4 

4 years 

0.1 

0.3 

1 .3 

2.2 

6 years 

1.2 

2.8 

0.5 

.... 

2.9 

8.8 

5.8 

8 years 

1.9 

8.6 

5.5 

10  years 

2.8 

5.1 

5.3 

9.2 

15.7 

15.7 

12  years 

11.4 

9.1 

18.0 

30.6 

30.7 

1 4 years 

5.3 

2.5 

5.1 

2.4 

1 6 years 

1 .3 

8.0 

7.1 

5.6 

12.2 

6.8 

18  years 

0.8 

10.2 

16.9 

20.6 

23.4 

18.9 

13.7 

18.8 

29.9 

20  years 

3.2 

8.4 

8.5 

17.6 

26.7 

22.7 

23.3 

23.4 

41.2 

21  years 

25  years 

5.9 

9.0 

18.8 

12.6 

23.5 

38.7 

30  years 

14.0 

29.3 

32.4 

25.6 

38.2 

51.8 

45.8 

60.7 

76.2 

35  years 

2.0 

13.6 

22.4 

14.8 

29.3 

43.5 

40.5 

59.8 

73.9 

40  years 

8.1 

31.5 

34.4 

23.6 

49.8 

60.3 

54.0 

65.6 

81.1 

45  years 

4.6 

22.9 

20.5 

18.6 

38.1 

47.4 

53.5 

69.6 

78.6 

50  years 

14.1 

35.8 

36.4 

32.9 

53.8 

61.8 

62.7 

70.6 

84.4 

55  years 

4.2 

6.2 

16.0 

22.4 

27.4 

44.4 

52.5 

66.8 

60  years 

14.1 

40.3 

39.1 

38.5 

60.6 

65.3 

73.6 

82.3 

90.2 

65  years 

1.2 

15.3 

16.8 

21.0 

27.1 

36.9 

65.8 

74.1 

82.0 

70  years 

8.1 

33.6 

27.0 

31.2 

50.4 

56.6 

74.0 

80.2 

87.9 

75  years 

1.3 

6.3 

5.3 

13.8 

24.6 

31.6 

67.2 

73.7 

82.8 

80  years 

22.4 

20.0 

20.8 

47.7 

56.6 

77.1 

84.2 

90.3 

85  years 

.... 

.... 

3.6 

58.1 

63.3 

71.0 

90  years 

.... 

12.5 

.... 

14.3 

47.1 

50.0 

• 

74.1 

78.6 

90.9 

1 For  the  mainland  of  the  United  States. 

2 Persons  of  negro  descent,  Chinese,  Japanese,  and  Indians. 
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On  account  of  the  constantly  increasing  number  of  foreign  white 
persons  reported  for  each  year  of  age,  no  attempt  was  made  in 
Table  1 to  measure  the  relative  excess  for  this  class  of  persons  under 
eighteen  years  of  age.  There  is  considerable  concentration  on 
eighteen  and  twenty  years  and  thereafter  on  the  five-  and  ten-year 
periods,  and  the  percentages  given  in  Table  1,  although  not  so 
conclusive  as  those  for  the  native  elements,  are  at  least  indicative 
of  the  extent  to  which  concentration  had  lessened  in  1900  as  com- 
pared with  the  two  censuses  preceding. 

The  general  conclusion  drawn  by  the  Census  authorities,  from  a 
study  of  the  figures  of  Table  1 was  “that,  as  far  as  the  concentration 
on  certain  ages  is  concerned,  the  attempt  to  secure  in  1900  a return 
of  age  according  to  date  of  birth  was  partially  successful,  but  it  is 
apparent  that  concentration  cannot  be  wholly  avoided  in  any 
case.  This  is  particularly  true  with  respect  to  colored  persons,  and 
to  a somewhat  less  extent  with  respect  to  foreign  white  persons; 
and  for  these  two  elements  of  the  population  it  is  probable  that 
no  great  improvement  can  be  expected  under  the  present  conditions 
of  census  enumeration.” 

That  no  great  progress  has  been  made  since  in  this  matter  is 
indicated  by  the  following  statement  in  the  reports  of  the  Four- 
teenth Census,  1920  (vol.  ii,  pp.  145,  146) : 

“Irregularities  of  this  character  are  due  in  large  part  to  errors  in  the  census 
returns.  These  errors  result  from  three  causes:  (1)  Some  persons  do  not  know 
their  exact  age.  (2)  The  enumerators  are  obliged  in  many  cases  to  obtain  infor- 
mation relating  to  the  persons  enumerated  from  a third  person,  either  some 
member  of  the  family  found  at  home  or  a person  in  charge  of  a hotel  or  boarding 
house,  who  can  give  the  age  only  approximately.  (3)  In  certain  instances, 
apparently  more  frequent  among  women  than  among  men,  the  age  is  inten- 
tionally misstated.  Where  the  age  is  not  accurately  known  there  is  a tendency 
to  report  it  as  a multiple  of  2 or  of  5,  and  especially,  in  the  case  of  ages  above 
20,  as  a multiple  of  10.  There  is  also  a tendency  to  concentrate  on  age  21  for 
men.  In  general,  the  degree  of  inaccuracy  is  greater  for  adults  than  for  children 
and  youths,  and  is  greater  for  those  classes  of  the  population  in  which  the  pro- 
portion of  illiterates  is  greatest.  The  returns  also  undoubtedly  exaggerate  the 
number  of  centenarians,  particularly  among  the  Negroes  and  Indians.” 

THE  REGISTRATION  METHOD 

The  theory  of  this  method  is  to  record  or  register  each  event 
in  the  ceaseless  flow  of  the  stream  of  life  as,  and  when , it  happens. 
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A mechanism  is  created  in  the  body  politic  which  makes  certain 
individuals  responsible  for  the  prompt  recording  of  each  event 
when  it  happens.  In  the  field  of  our  present  interest  it  is  the 
physician  who  is  thus  held  primarily  responsible  for  the  recording 
or  registering  with  some  central  authority  of  the  facts  about  births 
and  deaths.  If  a person  dies  and  no  physician  has  been  in  attend- 
ance, the  record  is  caught  up  through  the  necessity  of  a burial 
permit.  The  corpus  of  every  deceased  human  being  must  be  some- 
how disposed  of.  The  central  registration  authority  in  each  lo- 
cality is  the  only  person  qualified  to  permit  legal  disposal.  There- 
fore substantially  all  deaths  must  get  registered.  In  the  case  of 
birth,  the  attending  physician  or  midwife  again  is  required  by  law 
to  report  the  fact.  Unfortunately,  if  the  birth  has  not  been  attended 
by  anybody  but  the  mother  and  infant,  it  is  not  so  easy  as  in  the 
case  of  death  to  catch  the  record.  There  are  growing  up,  however, 
various  legal  necessities  for  the  possession  of  a birth  certificate, 
so  that  ultimately  the  registration  of  births  should  become  some- 
thing like  as  accurate  as  the  registration  of  deaths. 

The  heuristic  advantages  of  the  registration  over  the  census 
method  are  apparent.  The  course  of  events  can  be  followed. 
Registration  gives  us  such  knowledge  as  we  have  of  births,  deaths, 
sickness,  marriages,  divorces,  etc.,  so  far  as  concerns  population 
aggregates. 

THE  AD  HOC  OR  CASE  RECORD  METHOD 

This  is  the  ordinary  method  of  science  in  general  for  getting  a 
collection  of  pertinent  quantitative  data.  In  a defined  universe 
of  interest  cases  are  recorded  in  respect  of  the  points  or  attributes  of 
interest.  Thus  some  may  record  in  all  cases  of  typhoid  fever  the 
age,  stature,  body  weight,  daily  temperature,  etc.,  of  the  individual. 
Logically  considered,  it  is  a combination  of  the  essential  features 
of  the  census  and  the  registration  method  confined  to  a particular 
universe  of  interest.  In  a later  chapter  more  will  be  said  of  the 
making  of  medical  records. 

OFFICIAL  REGISTRATION  RECORDS 

There  are  reproduced  below  in  reduced  facsimile  the  standard 
birth  and  death  registration  certificates  as  used  in  the  United  States 
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DEPARTMENT  OF  COMMERCE 
BUREAU  OF  THE  CENSUS 

1.  PLACE  OF  BIRTH — 

County - 

Township — 

City - 


STANDARD  CERTIFICATE  OF  BIRTH 


State  File  No.  - 
Registered  No. 


....  State  . 
. or  Village  . 


No. — St.  Ward 

(If  birth  occureJ  iq  a hospital  or  institution,  give  its  NAME  instead  of  street  and  number) 

9 Full  name  nf  child  fH  child  is  not  yet  named,  make 

Z . r Ull  name  OT  cnna. - - --- (supplemental  report , its d i reeled 


3.  Sex 


If  plural 
births 


4,  Twin,  triplet,  or  other  — 

5.  Number,  In  order  of  birth  . 


6,  Premature 7.  Legit' 

mate?  - 

Full  term  I 


8.  Date  of 
birth— 


_<_M o u Ih,  day,  year) 


19-.- 


0 — 


s 

, bill 

Q I 
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>-  u 
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Z 
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9.  Full 
name 


FATHER 


10.  Residence  (usual  place  of  abode) 

(If  nonresident,  give  place  and  State)  - 


11.  Col  or  or  race 12,  Age  at  last  birthday - - (Years) 


1 3.  Birthplace  (city  or  place)  _ 
(State  or  country) 


T rade,  profession,  or  particular 
kind  of  work  done,  as  spinner, 
sawyer,  bookkeeper,  etc 

Industry  or  business  In  which 
work  was  done,  as  silk  mill, 
sawmill,  bank,  etc 


Date  (month  and  year)  last 
engaged  In  this  work 


. 19  — 


1 7.  Total  time  (years) 
spent  In  this  work-- 


18.  Full 
maiden 
name 


MOTHER 


1 9.  Residence  (usual  place  of  abode) 

(If  nonresident,  give  place  2nd  State)-. 


20.  Color  or  race - ! 21.  Age  at  last  birthday . (Years) 


22.  Birthplace  (city  or  place) . 
(State  or  country) 


23.  Trade,  profession,  or  particular  kind 

of  work  done,  as  housekeeper, 
typist,  nurse,  clerk,  etc 

24.  Industry  or  business  In  which 

work  was  done,  as  own  home, 
lawyer’s  office,  silk  mill,  etc 

25.  Date  (month  and  year)  last 
engaged  In  this  work 


26.  Total  time  (years) 
spent  In  this  work 


> 19— J 


27.  Number  of  children  of  this  mother 

(At  time  of  this  birth  and  including  this  child)  (a)  Born  alive  and  now  living 

(b)  Born  alive  but  now  dead 

28.  If  stillborn,  fmnnrhQ 

period  of  gestation "{or  weeks  29.  Cause  of  stillbirth 

(Before  labor — 

(During  labor — - 

e 

£ 


^ 6 


I 

cc 


I GO 

OD  . 


CERTI FICATE  OF  ATTENDING  PHYSICIAN  OR  MIDWIFE 

hereby  certify  that  I attended  the  birth  of  this  child,  who  was  at m.  on  the  date  above  stated 


( When  there  was  no  attending  physician' ) 
' or  midwife,  then  the  father,  householder,  > 
(etc.,  should  makg  this  return.  ) 

Given  name  added  from 
a supplemental  report 


(Date  of) 


Registrar. 


(Born  alive  or  stillborn) 

(Signed)  — 

or 

Address 

Filed - , 19. 


M.  D. 

.,  Midwife 


Registrar, 


UNITED  STATES  STANDARD  CERTIFICATE  OF  BIRTH 


Why  births  should  be  registered. — There  is  hardly  a relation  of  life,  social,  legal,  or  economic,  in  which  the  evidence 
furnished  by  an  accurate  registration  of  births  may  not  prove  to  be  of  the  greatest  value,  not  only  to  the  individual  but 
also  to  the  public  at  large.  It  is  not  only  an  act  of  civilization  to  register  birth  certificates  but  good  business,  for  they  are 
frequently  used  in  many  practical  ways,  some  of  which  are  listed  below: 


(1)  As  evidence  to  prove  the  age  and  legitimacy  of  heirs; 

(2)  As  pfoof  of  age  to  determine  the  validity  of  a contract 
entered  into  by  an  alleged  minor; 

(3)  As  evidence  to  establish  age  and  proof  of  citizenship 
and  descent  in  order  to  vote; 

(4)  As  evidence  to  establish  the  right  of  admission  to  the 
professions  and  to  many  public  offices; 

(5)  As  evidence  of  legal  age  to  marry; 

(6)  As  evidence  to  prove  the  claims  of  widows  and  orphans 
under  the  widows’  and  orphans’  pension  law; 

(7)  As  evidence  to  determine  the  liability  of  parents  for 
the  debts  of  a minor; 


(8)  As  evidence  in  the  administration  of  estates,  the  settle- 
ment of  insurance  and  pensions; 

(9)  As  evidence  to  prove  the  irresponsibility  of  children 
under  legal  age  for  crime  and  misdemeanor,  and  various 
other  matters  in  the  criminal  code; 

(10)  As  evidence  in  the  enforcement  of  law  relating  to 
education  aDd  to  child  labor; 

(11)  As  evidence  to  determine  the  relations  of  guardians 
and  wards; 

(12)  As  proof  of  citizenship  in  order  to  obtain  a. passport; 

(13)  As  evidence  in  the  claim  for  exemption  from  or  the 
right  to  jury  and  military  seryice. 


Statement  of  occupation. — Make  some  entry  in  this  section  for  each  parent.  For  a woman  whose  only  occupation  is 
that  of  home  housework,  write  housework  in  answer  to  Question  23  and  own  home  in  answer  to  Question  24.  For  a person 
engaged  in  domestic  service  for  wages,  however,  designate  the  occupation  by  the  appropriate  terms,  as  housekeeper — private 
family,  cook — hotel,  etc.  For  a person  who  has  no  occupation  whatever  write  none. 

To  be  complete,  an  occupation  return  must  state; 

14  and  23. — The  trade,  profession,  or  particular  kind  of  work  done. 

15  and  24. — The  industry  or  business  in  which  the  work  is  done. 

10  and  25. — The  month  and  year  the  person  last  worked  at  the  occupation. 

17  and  26. — The  number  of  years  the  person  followed  the  occupation. 

In  stating  the  occupation,  avoid  the  use  of  such  indefinite  terms  as  “employee,”  “worker,”  “operative,”  etc.  Find 
out  the  particular  kind  of  work  done  and  return  that,  as  spinner,  weaver,  etc. 

In  stating  the  industry  or  business,  avoid  the  use  of  such  general  terms  as  "Store, ’’  “factory,”  “mill,”  etc.  State  the 
particular  kind  of  store,  factory,  mill,  etc.,  as  grocery  store,  soap  factory,  cotton  mill,  etc. 

Distinguish  carefully  the  different  kinds  of  engineers  by  stating  the  full  descriptive  titles,  as  civil  engineer,  mechanical 
engineer,  mining  engineer,  stationary  engineer,  etc.  Avoid  the  term  “laborer”  when  a more  precise  statement  of  occupation 
can  be  secured.  Do  not  use  the  word  “mechanic,”  but  give  the  exact  occupation,  as  carpenter,  painter,  machinist,- etc. 
Distinguish  carefully  between  retail  merchants  and  wholesale  merchants.  A person  who  sells  goods  should  be  called  a 
salesman  and  not  a clerk. 
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Registration  Areas.  They  are  introduced  here  in  order  that  the 
reader  may  understand  clearly  what  information  is  basically  available 
in  official  vital  statistics  in  the  United  States.  In  actual  practice  the 
extent  to  which  the  different  items  on  the  certificates  are  filled  out 
depends  upon  the  force  and  vigilance  of  the  registration  officials.  In 
some  communities  there  is  a good  deal  of  laxity  in  regard  to  such 
items  as  occupation,  birthplace  of  parents,  etc.  But  if  the  registra- 
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1.  PLACE  OF  DEATH 

County 


STANDARD  CERTIFICATE  OF  DEATH 


DEPARTMENT  OF  COMMERCE 
BUREAU  OF  THE  CENSUS 


State  . 

Township . or  Village  . 

City No 


. Registered  No. 


St 

i e tead  of  street  and  number) 


or 

.Ward 


(If  death  oocured  in  a hospital  or  institution,  give  its  naub  L 

Length  of  residence  in  city  or  town  where  death  occurred yrs mos ds.  How  long  in  U.  S.  If  of  foreign  birth? yrs. mo9 ds 


2.  FULL  NAME 

(a)  Residence:  No. 


(Usual  place  of  abode) 


. St., ....Ward . 


(If  nonresident  give  city  or  town  and  State) 


PERSONAL  AND  STATISTICAL  PARTICULARS 


3. 

SEX 

4.  COLOR  OR  RACE 

5.  Single.  Married,  Widowed, 
or  Divorced  (write  the  word) 

5a.  If  married,  widowed,  or  divorced 
HUSBAND  of 
(or)  WIFE  of 

6. 

DATE  OF  BIRTH  (month,  day,  and  year) 

7. 

AGE  Years 

Months 

Days 

If  LESS  than 

1 day,. hrs. 

or min. 

z 

8.  Trade,  profession,  or  particular 
kind  of  work  done,  as  spinner, 

»- 

< 

a. 

3 

9.  Industry  or  business  In  which 
work  was  done,  as  silk  mill. 

y 

o 

10.  Date  deceased  last  worked  at 
this  occupation  (month  and 
year).. t 

11.  Total  time  (years) 
spent  In  this 
occupation 

12 

BIRTHPLACE  (city  or  town) 

(State  or  country) 

a : 

13.  NAME 

X 

14.  BIRTHPLACE  feitv  or  town)  

L u 

(State  or  country) 

CC 

LU 

15.  MAIDEN 

NAME 

X 

P- 

Ifi.  BIRTHPLACE  (citv  or  town) 

2 

(State  or  country) 

17 

INFORMANT 

(Address) 

18 

BURIAL,  CREMATION,  OR  REMOVAL 
Place ■ 

Date 

19.... 

IQ  UNDERTAKER 

(Address) 

20.  FILED. 


Rcgislrpr. 


MEDICAL  CERTIFICATE  OF  DEATH 


21.  DATE  OF  DEATH  (month,  day,  and  year) 


. 19 


22.  | HEREBY  OERTIF  V,  That  I attended  deceased  from 

19....,  to 19.... 

I last  saw  h alive  on.. 19 death  is  said 

to  have  occurred  on  the  date  stated  above,  at ...m. 

The  principal  cause  of  death  and  related  causes  of  Importance  in  order  of 
onset  were  as  follows:  DaU  o(oDS£l 


Contributory  causes  of  importance  not  relate^  to  principal  cause: 


Name  of  operation Date  of 

What  test  confirmed  d-agnosis? ------- Was  there  a n autopsy?.. 


23.  If  death  was  due  to  external  causes  (violence)  fill  In  also  the  following: 

Accident,  suicide,  or  homicide?.... Date  of  Injury.... 19 

W-tiere  did  Injury  occur?.. 


(Specify  oity  or  town,  county,  and  8tate) 

Specify  whether  Injury  occured  In  industry.  In  home,  or  In  public  place. 


Manner  of  Injury  . 
Nature  of  injury  . . 


24.  Was  disease  or  Injury  in  any  way  related  to  occupation  of  deceased? 

If  so,  specify 

(Signed) M.  D. 

(Address)  


tion  officials  are  sufficiently  active  and  painstaking  in  their  duties, 
all  of  the  information  called  for  on  the  certificates  can  be  had. 

The  student  of  vital  statistics  should  study  the  birth  and  death 
certificates  with  the  most  painstaking  care.  Indeed,  he  will  find 
it  advantageous  to  learn  by  heart  every  word  and  punctuation 
mark  on  them.  From  the  certificates  comes  the  raw  material  with 
which  he  is  compelled  to  work.  Whenever  he  deals  with  birth  or 
death  rates,  in  whatever  connection,  there  should  be  a clear  picture 
in  his  mind  as  to  exactly  how  the  basic  data  were  got,  and  what 
they  mean  in  the  individual  case. 


THE  RAW  DATA  OE  BIOSTATISTICS 


75 


The  latest  (1930)  improved  standard  forms  of  birth  certificates 
and  death  certificates,  as  officially  approved  by  the  Census  Bureau 
of  the  United  States,  are  shown  on  pages  73-75.  In  both  cases  the 
printed  matter  on  the  reverse  of  the  certificate  is  reproduced,  as 
well  as  the  material  on  the  face. 

The  new  death  certificate  embodies  a number  of  improvements 
over  the  one  formerly  used.  These  chiefly  concern  greater  detail 

UNITED  STATES  STANDARD  CERTIFICATE  OF  DEATH 


Statement  of  occupation. — Precise  statement  of  occupation  is  very  important,  so  that  the  relative  healthfulness  of  variotis 
pursuits  can  be  known.  Make  some  entry  in  this  section  for  every  person  aged  10  years  or  over.  If  the  occupation  had  been 
given  up  or  changed  on  account  of  the  disease  causing  death,  report  the  occupation  prior  to  illness.  If  the  deceased  had  retired 
from  business,  report  the  occupation  prior  to  retirement.  Children  not  gainfully  employed  may  be  returned  as  at  school  or 
at  home.  For  a woman  whose  only  .'occupation  was  that  of  home  housework,  write  housework  in  answer  to  Question  8 and 
own  home  in  answer  to  Question  9.  For  a person  engaged  in  domestic  service  for  wages,  however,  designate  the  occupation  by 
the  appropriate  terms,  as  housekeeper — private  family,  cook — hotel,  etc.  For  a person  who  had  no  occupation  whatever  write  none. 

To  be  complete,  an  occupation  return  must  state: 

8.  — The  trade,  profession,  or  particular  kind  of  work  done. 

9.  — The  industry  or  business  in  which  the  work  was  done. 

10.  — The  month  and  year  the  deceased  last  worked  at  the  occupation. 

11.  — The  number  of  years  the  deceased  followed  the  occupation. 

In  stating  the  occupation,  avoid  the  use  of  such  indefinite,  terms  as  " employee, ” "worker,”  "operative,”  etc.  Find  out 
the'  particular  kind  of  work  done  and  return  that,  as  spinner,  weaver , etc. 

In  stating  the  industry  or  business,  avoid  the  use  of  such  general  terms  as  "store,”  "factory,”  "mill,”  etc.  State  the 
particular  kind  of  store,  factory,  mill,  etc.,  as  grocery  store,  soap  factory,  cotton  mill , etc. 

Distinguish  carefully  the  different  kinds  of  engineers  by  stating  the  full  descriptive  titles,  as  civil  engineer,  mechanical 
engineer,  mining  engineer,  stationary  engineer , etc.  Avoid  the  term  "laborer”  when  a more  precise  statement  of  the  occupation 
can  be  secured.  Do  not  use  the  word  "mechanic,”  but  give  the  exact  occupation,  as  carpenter , painter,  machinist,  etc. 
Distinguish  carefully  between  retail  merchants  and  wholesale  merchants.  A person  who  sells  goods  should  be  called  a salesman 
and  not  a clerk. 

Statement  of  cause  of  death. — Cause  of  death  means  the  disease,  injury,  or  complication  which  causes  death,  not  the  mode 
of  dying,  e.  g.,  heart  failure,  asphyxia,  asthenia,  etc.  As  principal  cause  name  the  disease  or  injury  causing  death.  As  related 
causes,  name  earlier  morbid  conditions,  if  any,  related  to  the  principal  cause  and  any  important  complication  of  the  principal 
cause.  Under  contributory  causes  of  importance  not  related  to  principal  cause,  name  other  important  diseases  or  injuries. 
Examples: 


Example  I 

Example  II 

The  principal  cause  of  death  and  related  causes 
of  importance  in  order  of  onset  were  as  follows: 

Arteriosclerosis 

Dale  of  onsel 

1915 

The  principal  cause  of  death  and  related  causes 
of  importance  in  order  of  onset  were  as  follows: 

Attack  of  epilepsy 

Dale  of  onset 

1 Week  ago 

Chronic  interstitial  nephritis 

1921 

Ran  over  by  street  car 

1 week  ago 

Cerebral  hemorrhage 

July  5, 1927 

Peritonitis 

$ days  ago 

Contributory  causes  of  importance  not  related 
to  principal  cause: 

Fracture  of  arm 

Contributory  causes  of  importance  not  related 
to  principal  cause: 

Influenza 

6 wteH  ago 

Automobile  accident 

May  <S,  192 7 

In  a group  of  causes  containing  the  principal  cause  and  related  causes,  the  causes  should  be  given  in  the  order  of  onset,  so 
that  in  a group  of  three  causes  the  principal  cause  ihay  appear  in  either  first,  second,  or  third  position.  The  principal  cause 
in  each  of  the  above  examples  happens  to  be  the  second  cause  given. 


ADDITIONAL  SPACE  FOR  FURTHER  STATEMENTS  BY  PHYSICIAN 


in  regard  to  primary,  subsequent,  and  contributory  causes;  with 
the  occupational  relationships  to  death;  and  with  the  more  sci- 
entific or  objective  support  of  cause  of  death  by  postmortem,  opera- 
tive, or  laboratory  evidence — chemical,  bacteriological,  or  biological. 
These  improved  certificates  may  reasonably  be  expected,  with  the 
passage  of  time,  to  furnish  a mass  of  data  on  mortality  of  presum- 
ably superior  accuracy  as  to  cause  of  death,  as  compared  with  any- 
thing hitherto  available. 
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In  some  localities  a special  certificate  is  used  to  record  still- 
births on  the  ground  that  “there  is  merit  in  requiring  a birth  and 
death  certificate  for  children  born  alive  and  soon  dying,  and  a 
different  form  to  record  viable  products  of  conception  stillborn.” 
The  form  of  still-birth  certificate  used  in  New  York  City  is  as  follows : 

„ „ 25-2606  -28-B 


Department  of  Health  of  The  City  of  New  York 

BUREAU  OF  RECORDS 


No  foetus  of  any  period  of  uterine  gestation  should  be  interred  or  disposed  of  in  any  other 
manner  without  a permit  therefor  having  been  obtained  from  the  Department  of  Health,  such  permit 
to  be  granted  upon  the  presentation  of  a proper  return. 

Persons  who  are  unable  or  unwilling  for  any  reason  to  bury  a foetus  should  immediately 
notify  the  Department  of  Health,  which  Department  will  see  that  the  foetus  is  properly  and  promptly 
buried  in  the  City  Cemetery . 

CERTIFICATE  OF  A STILL-BIRTH 


The  death  of  an  infant  that  has  breathed  must  not  be  reported  as  a still-birth;  such  cases 
must  be  reported  by  filing  a certificate  of  birth  and  a certificate  of  death. 

Borough  of.- Registered  No 

Character  of  premises, 
c whether  tenement,  private, 

Mo - ot.  hotel,  hospital  or  other  place,  etc. - - 

Sex Color  or  Race...., Date  of  Still-Birth 192. 


(Month) (Day) 


Name 

Residence 

Birthplace 

Age 

Occupation 

Father 

Mother 

Color 

or  Race 

Color 
or  Race 

! 

Period  of 

Utero  Gestation 

Number  of  Number  of 

Previous  Pregnancies  Living  Births 

I hereby  certify  that  the  foregoing  particulars  are  correct  as  near  as  the  same  can  be  ascer- 
tained, and  I further  certify  that  I attended  at  this  still-birth;  that  the  still-birth  occurred  on  the 


day  of .... 192......  that  the  actua  l cause  of  the  death  of  this  foetus  was 


O 6f0  VS 

.and  that  said  death  of  foetus  occurred  dzzrin^  labor. 


Predisposing  cause 


192 


Filed 

- ^ 

Address 

Place  of  Burial 

# 

Date  of  Burial 

Undertaker 

Address 

STILL-BIRTH  PROCEDURE  FOR  MIDWIVES 

Should  the  child  not  breathe  after  birth,  the  midwife  must  report  the  fact  at  once,  by 
telephone  or  messenger,  to  the  Department  of  Health,  when  an  inspector  will  visit  the  case  and 
countersign  the  still-birth  certificate  which  the  midwife  must  leave  at  the  home. 

The  foetus  must  not  be  removed  from  the  premises  until  this  certificate  has  been  approved 
by  the  inspector  from  the  Department  of  Health  and  a permit  has  been  issued  by  the  Bureau  of 
Records. 

I hereby  certify  that  I have  been  employed  as  undertaker  by 

the of  deceased.  This  statement  is  made  to  obtain  a permit  for  the 

(relationship) 

burial  or  cremation  of  the  remains  of  deceased 


Signature 
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THE  INTERNATIONAL  LIST  OF  THE  CAUSES  OF  DEATH 

If  the  statistics  of  mortality  are  to  be  comparable  from  locality 
to  locality,  it  is  plain  that  a uniform  system  of  nomenclature  of  the 
causes  of  death  must  everywhere  be  used.  Similarly,  if  hospital  rec- 
ords are  to  be  comparable,  a uniform  system  of  nomenclature  of  mor- 
bid conditions  and  of  treatments  and  results  must  be  in  operation. 

The  science  of  nosology,  or  the  classification  of  disease,  attracted 
a great  deal  more  attention  from  medical  men  a century  ago  than 
it  does  now.  The  predominant  system  in  vogue  for  a long  time  was 


Fig.  12. — Portrait  of  Dr.  Jacques  Bertillon.  (Reproduced  through  the  kindness 
of  Dr.  Frederick  L.  Hoffman,  to  whom  the  original  belongs,  and  Brig.-Gen.  Robert  E. 
Noble,  Librarian  of  the  Surgeon-Generaks  office.) 

due  to  Cullen.  The  first  attempt  to  adapt  it  specifically  to  statis- 
tical uses  was  due  to  William  Farr.  In  the  First  Annual  Report 
of  the  Registrar- General  of  England  and  Wales  Farr  said: 

“The  advantages  of  a uniform  statistical  nomenclature,  however  imperfect,  are 
so  obvious  that  it  is  surprising  no  attention  has  been  paid  to  its  enforcement  in  Bills  of 
Mortality.  Each  disease  has  in  many  instances  been  denoted  by  three  or  four  terms, 
and  each  term  has  been  applied  to  as  many  different  diseases;  vague,  inconvenient 
names  have  been  employed,  or  complications  have  been  registered  instead  of  primary 
diseases.  The  nomenclature  is  of  as  much  importance  in  this  department  of  inquiry 
as  weights  and  measures  in  the  physical  sciences,  and  should  be  settled  without  delay.” 
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The  First  Statistical  Congress,  held  in  Brussels  in  1853,  selected 
Farr  and  Marc  d’Espine  of  Geneva  to  draw  up  a report  upon  a 
classification  adapted  to  international  use.  It  is  interesting  to  note 
that  the  resolution  to  this  end  was  introduced  in  the  Congress  by 
Dr.  Achille  Guillard,  who  was  the  maternal  grandfather  of  Dr. 
Jacques  Bertillon.  In  the  last  quarter  of  a century  Bertillon  has 
been  perhaps  more  active  than  anyone  else  in  perfecting  and 
extending  the  use  of  the  International  Classification. 

The  classification  prepared  by  Farr  and  d’Espine  was  adopted 
in  Paris  in  1855,  in  Vienna  in  1857,  and  was  translated  into  six 
languages.  It  was  revised  in  1864,  1874,  1880,  and  1886.  With 
further  revision  it  was  adopted  by  the  International  Statistical 
Institute  in  Chicago  in  1893,  and  provisions  were  made  for  de- 
cennial revisions.  The  first  of  these  was  made  in  1900,  the  second 
in  1909,  the  third  in  1920,  and  the  most  recent  one  in  1929. 

The  present  form  of  the  International  List,  after  its  latest 
revision,  is  as  follows: 

INTERNATIONAL  LIST  OF  CAUSES  OF  DEATH 

(Fourth  Decennial  Revision  by  the  International  Commission,  Paris,  October, 

1929.) 

(The  numbers  in  the  list  represent  obligatory  divisions.  The  subdivisions 
indicated  by  letters  a,  b,  c,  etc.,  are  optional.  When  a cause  of  death  is  obliga- 
torily divided  among  several  numbers,  it  is  essential  to  reserve  in  the  tables  a line 
for  the  total,  relative  to  this  cause.  Example:  Tuberculosis  (all  forms)  Nos. 
23  to  32.) 

I.  Infectious  and  Parasitic  Diseases 

1.  Typhoid  fever. 

2.  Paratyphoid  fever. 

3.  Typhus  fever. 

4.  Relapsing  fever. 

5.  Undulant  fever. 

6.  Smallpox: 

(a)  Variola  major. 

( b ) Variola  minor,  alastrim. 

(c)  Not  specified. 

7.  Measles. 

8.  Scarlet  fever. 

9.  Whooping  cough. 

10.  Diphtheria. 

11.  Influenza: 

(a)  With  respiratory  complications  specified. 

( b ) Without  respiratory  complications  specified. 
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12.  Cholera. 

13.  Dysentery: 

(a)  Amebic. 

( b ) Bacillary. 

(c)  Unspecified  or  due  to  other  causes. 

14.  Plague: 

(a)  Bubonic. 

(b)  Pneumonic. 

(c)  Septicemic. 

(d)  Unspecified. 

15.  Erysipelas. 

16.  Acute  poliomyelitis  and  acute  polio-encephalitis. 

17.  Lethargic  or  epidemic  encephalitis. 

18.  Epidemic  cerebrospinal  meningitis. 

19.  Glanders. 

20.  Anthrax  (Bacillus  anthracis),  malignant  pustule. 

21.  Rabies. 

22.  Tetanus. 

23.  Tuberculosis  of  the  respiratory  system. 

24.  Tuberculosis  of  the  meninges  and  central  nervous  system. 

25.  Tuberculosis  of  the  intestines  and  peritoneum  (including  the  mesenteric  and 

retroperitoneal  glands). 

26.  Tuberculosis  of  the  vertebral  column. 

27.  Tuberculosis  of  the  bones  and  joints  (vertebral  column  excepted): 

(a)  Bones. 

(b)  Joints. 

28.  Tuberculosis  of  the  skin  and  subcutaneous  cellular  tissue. 

29.  Tuberculosis  of  the  lymphatic  system  (bronchial,  mesenteric,  and  retro- 

peritoneal glands  excepted). 

30.  Tuberculosis  of  the  genito-urinary  system. 

31.  Tuberculosis  of  other  organs. 

32.  Disseminated  tuberculosis: 

( a ) Acute. 

(b)  Chronic. 

(c)  Unspecified. 

33.  Leprosy. 

34.  Syphilis: 

(a)  Congenital. 

( b ) Acquired. 

(c)  Unspecified. 

35.  Gonococcus  infection  and  other  venereal  diseases. 

36.  Purulent  infection,  septicemia,  non-puerperal: 

(a)  Septicemia. 

(. b ) Pyemia  or  pyohemia. 

(c)  Gas  gangrene. 

37.  Yellow  fever. 

38.  Malaria: 

(a)  Malarial  fever 

( b ) Malarial  cachexia. 
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39.  Other  diseases  due  to  protozoal  parasites. 

40.  Ancylostomiasis. 

41.  Hydatid  cysts: 

(а)  Of  the  liver. 

(б)  Of  other  organs. 

42.  Other  diseases  caused  by  helminths. 

43.  Mycoses. 

44.  Other  infectious  and  parasitic  diseases: 

(a)  Chicken-pox. 

( b ) German  measles. 

(c)  Others  under  this  title. 

II.  Cancers  and  Other  Tumors 

45.  Cancer  and  other  malignant  tumors  of  the  buccal  cavity  and  pharynx: 

(a)  Lip. 

(h)  Tongue. 

(c)  Mouth. 

(d)  Jaw. 

(e)  Other  and  unspecified  parts  of  buccal  cavity. 

(/)  Pharynx. 

46.  Cancer  and  other  malignant  tumors  of  the  digestive  tract  and  peritoneum 

(a)  Esophagus. 

(b)  Stomach  and  duodenum 

(c)  Intestine  (except  rectum  and  anus). 

(d)  Rectum  and  anus. 

(e)  Liver  and  biliary  passages. 

(/)  Pancreas. 

(g)  Peritoneum. 

(h)  Others. 

47.  Cancer  and  other  malignant  tumors  of  the  respiratory  system: 

(a)  Larynx. 

(b)  Lungs  and  pleura. 

(c)  Others. 

48.  Cancer  and  other  malignant  tumors  of  the  uterus. 

49.  Cancer  and  other  malignant  tumors  of  other  female  genital  organs: 

(a)  Ovary  and  Fallopian  tube. 

(b)  Vagina  and  vulva. 

( c ) Others. 

50.  Cancer  and  other  malignant  tumors  of  the  breast. 

51.  Cancer  and  other  malignant  tumors  of  the  male  genito-urinary  organs 

(a)  Kidneys  and  suprarenals  (male). 

(b)  Bladder  (male). 

(c)  Prostate. 

(d)  Testes. 

(e)  Scrotum. 

(/)  Others. 

52.  Cancer  and  other  malignant  tumors  of  the  skin. 
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53.  Cancer  and  other  malignant  tumors  of  other  or  unspecified  organs: 

(a)  Kidneys  and  suprarenals  (female). 

( b ) Bladder  (female). 

(c)  Brain. 

C d ) Bones  (except  jaw). 

( e ) Others. 

54.  Non-malignant  tumors: 

(a)  Of  the  ovary. 

( b ) Of  the  uterus. 

( c ) Of  other  female  genital  organs. 

(< d ) Of  the  brain. 

(e)  Of  other  organs. 

55.  Tumors  of  which  the  nature  is  not  specified: 

(a)  Of  the  ovary. 

(b)  Of  the  uterus. 

(c)  Of  other  female  genital  organs. 

(d)  Of  the  brain. 

(e)  Of  other  organs. 

III.  Rheumatic  Diseases , Nutritional  Diseases,  Diseases  of  the  Endocrine  Glands 

and  Other  General  Diseases 

56.  Acute  rheumatic  fever. 

57.  Chronic  rheumatism,  osteo-arthritis. 

58.  Gout. 

59.  Diabetes  mellitus. 

60.  Scurvy: 

(a)  Infantile  scurvy  (Barlow’s  disease). 

( b ) Scurvy. 

61.  Beriberi. 

62.  Pellagra. 

63.  Rickets. 

64.  Osteomalacia. 

65.  Diseases  of  the  pituitary  body. 

66.  Diseases  of  the  thyroid  and  parathyroid  glands: 

(a)  Simple  goiter. 

(b)  Exophthalmic  goiter. 

( c ) Myxedema  and  cretinism. 

(d)  Tetany. 

(e)  Others. 

67.  Diseases  of  the  thymus  gland. 

68.  Diseases  of  the  adrenals  (Addison’s  disease;  not  specified  as  tuberculous). 

69.  Other  general  diseases. 

IV.  Diseases  of  the  Blood  and  Blood-making  Organs 

70.  Hemorrhagic  conditions: 

(a)  Primary  purpuras. 

( b ) Hemophilia. 
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71.  Anemias: 

(a)  Pernicious  anemia. 

(b)  Others. 

72.  Leukemias  and  pseudoleukemias: 

(a)  True  leukemias. 

(b)  Pseudoleukemias  (Hodgkin’s  disease). 

73.  Diseases  of  the  spleen. 

74.  Other  diseases  of  the  blood  and  blood-making  organs. 

V.  Chronic  Poisonings  and  Intoxications 

75.  Alcoholism  (acute  or  chronic). 

76.  Chronic  poisoning  by  mineral  substances: 

(a)  Lead. 

(b)  Occupational  (except  lead). 

(c)  Others. 

77.  Chronic  poisoning  by  organic  substances: 

T. 

(a)  Occupational. 

(b)  Others. 

VI.  Diseases  of  the  Nervous  System  and  of  the  Organs  of  Special  Sense 

78.  Encephalitis  (non-epidemic): 

(a)  Abscess  of  the  brain. 

{b)  Others. 

79.  Meningitis: 

(a)  Simple  meningitis. 

(b)  Non-epidemic  cerebrospinal  meningitis. 

80.  Progressive  locomotor  ataxia  (tabes  dorsalis). 

81.  Other  diseases  of  the  spinal  cord. 

82.  Cerebral  hemorrhage,  cerebral  embolism  and  thrombosis: 

(a)  Cerebral  hemorrhage. 

(b)  Cerebral  embolism  and  thrombosis. 

(c)  Hemiplegia  and  causes  unspecified. 

83.  General  paralysis  of  the  insane. 

84.  Dementia  praecox  and  other  psychoses: 

(a)  Dementia  praecox. 

(b)  Other  psychoses. 

85.  Epilepsy. 

86.  Convulsions  (under  five  years  of  age). 

87.  Other  diseases  of  the  nervous  system: 

(a)  Softening  of  the  brain. 

(b)  Neuralgia  and  neuritis. 

(c)  Others. 

88.  Diseases  of  the  organs  of  vision. 

89.  Diseases  of  the  ear  and  of  the  mastoid  process: 

(a)  Otitis. 

( b ) Diseases  of  the  mastoid  process. 

(c)  Others. 
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VII,  Diseases  of  the  Circulatory  System 

90.  Pericarditis. 

91.  Acute  endocarditis: 

(a)  Specified  as  acute. 

( b ) Unspecified  (under  forty-five  years  of  age). 

92.  Chronic  endocarditis,  valvular  diseases: 

(a)  Endocarditis  specified  as  chronic  and  valvular  disease. 

( b ) Endocarditis  unspecified  (forty-five  years  and  over). 

93.  Diseases  of  the  myocardium: 

(a)  Acute  myocarditis. 

(■ b ) Myocarditis  unspecified  (under  forty-five  years  of  age). 

( c ) Chronic  myocarditis  and  myocardial  degeneration. 

(d)  Unspecified. 

94.  Diseases  of  the  coronary  arteries  and  angina  pectoris: 

(a)  Angina  pectoris. 

( b ) Diseases  of  the  coronary  arteries. 

95.  Other  diseases  of  the  heart: 

(a)  Functional  diseases  of  the  heart. 

(■ b ) Other  and  unspecified. 

96.  Aneurysm  (except  of  the  heart). 

97.  Arteriosclerosis  (diseases  of  the  coronary  arteries  excepted). 

98.  Gangrene  (not  gas  gangrene,  see  36c): 

(a)  Senile. 

(b)  Others. 

99.  Other  diseases  of  the  arteries. 

100.  Diseases  of  the  veins  (varices,  hemorrhoids,  phlebitis,  etc.;  not  phlegmasia 

alba  dolens,  see  148a). 

101.  Diseases  of  the  lymphatic  system  (lymphangitis,  etc.). 

102.  Idiopathic  anomalies  of  the  blood  pressure. 

103.  Other  diseases  of  the  circulatory  system. 

VIII.  Diseases  of  the  Respiratory  System 

104.  Diseases  of  the  nasal  fossae  and  annexa. 

105.  Diseases  of  the  larynx. 

106.  Bronchitis: 

(a)  Acute. 

(b)  Chronic. 

(c)  Unspecified. 

107.  Bronchopneumonia  (including  capillary  bronchitis): 

(a)  Bronchopneumonia. 

(b)  Capillary  bronchitis. 

108.  Lobar  pneumonia. 

109.  Pneumonia,  unspecified. 

110.  Pleurisy. 

111.  Congestion,  edema,  embolism,  hemorrhagic  infarct,  and  thrombosis  of  the 

lungs: 

(a)  Pulmonary  embolism  and  thrombosis. 

( b ) Others. 
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112.  Asthma. 

113.  Pulmonary  emphysema. 

114.  Other  diseases  of  the  respiratory  system  (tuberculosis  excepted): 

( a ) Chronic  interstitial  pneumonia  including  occupational  diseases  of  the 

respiratory  system. 

( b ) Others,  including  gangrene  of  the  lung. 

IX.  Diseases  of  the  Digestive  System 

115.  Diseases  of  the  buccal  cavity  and  annexa  and  of  the  pharynx  and  tonsils 

(including  adenoid  vegetations): 

( a ) Pharynx  and  tonsils. 

(b)  Others. 

116.  Diseases  of  the  esophagus, 

117.  Ulcer  of  the  stomach  and  duodenum: 

(a)  Stomach. 

( b ) Duodenum. 

118.  Other  diseases  of  the  stomach  (cancer  excepted). 

119.  Diarrhea  and  enteritis  (under  two  years  of  age). 

120.  Diarrhea,  enteritis,  and  ulceration  of  intestines  (two  years  and  over): 

(a)  Diarrhea,  enteritis. 

( b ) Ulceration  of  intestines. 

121.  Appendicitis. 

122.  Hernia,  intestinal  obstruction: 

(a)  Hernia. 

( b ) Intestinal  obstruction, 

123.  Other  diseases  of  the  intestines. 

124.  Cirrhosis  of  the  liver: 

(a)  Specified  as  alcoholic. 

( b ) Not  specified  as  alcoholic. 

125.  Other  diseases  of  the  liver  (including  yellow  atrophy  of  liver): 

(a)  Yellow  atrophy  of  liver. 

(b)  Others. 

126.  Biliary  calculi. 

127.  Other  diseases  of  the  gall-bladder  and  biliary  passages. 

128.  Diseases  of  the  pancreas. 

129.  Peritonitis,  cause  not  specified. 

X.  Diseases  of  the  Genito-urinary  System 

130.  Acute  nephritis. 

131.  Chronic  nephritis. 

132.  Nephritis,  unspecified. 

133.  Other  diseases  of  the  kidneys  and  ureters  (puerperal  diseases  excepted): 

(a)  Pyelitis. 

( b ) Others. 

134.  Calculi  of  the  urinary  passages: 

(a)  Calculi  of  the  kidneys  and  ureters. 

( b ) Calculi  of  the  bladder. 

(c)  Other  and  unspecified. 
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135.  Diseases  of  the  bladder  (tumors  excepted): 

(a)  Cystitis. 

( b ) Others. 

136.  Diseases  of  the  urethra,  urinary  abscess,  etc.: 

(a)  Stricture  of  the  urethra. 

(&)  Others. 

137.  Diseases  of  the  prostate. 

138.  Diseases  of  the  male  genital  organs — not  specified  as  venereal. 

139.  Diseases  of  the  female  genital  organs — not  specified  as  venereal: 

(a)  Ovaries,  tubes,  and  parametrium. 

(b)  Uterus. 

(c)  Breast. 

(d)  Others. 

XI.  Diseases  of  Pregnancy,  Childbirth,  and  the  Puerperal  State 

140.  Abortion  with  septic  conditions. 

141.  Abortion  without  mention  of  septic  conditions  (to  include  hemorrhages). 

142.  Ectopic  gestation. 

143.  Other  accidents  of  pregnancy  (not  to  include  hemorrhages). 

144.  Puerperal  hemorrhage: 

(a)  Placenta  previa. 

(b)  Other  hemorrhages. 

145.  Puerperal  septicemia  (not  specified  as  due  to  abortion): 

(a)  Puerperal  septicemia  and  pyemia. 

( b ) Puerperal  tetanus. 

146.  Puerperal  albuminuria  and  eclampsia. 

147.  Other  toxemias  of  pregnancy. 

148.  Puerperal  phlegmasia  alba  dolens,  embolus,  sudden  death  (not  specified  as 

septic) : 

(a)  Phlegmasia  alba  dolens. 

(b)  Embolism  and  thrombosis. 

149.  Other  accidents  of  childbirth: 

(a)  Cesarean  operation. 

(b)  Others. 

150.  Other  and  unspecified  conditions  of  the  puerperal  state. 

XII.  Diseases  of  the  Skin  and  Cellular  Tissue 

151.  Furuncle,  carbuncle. 

152.  Phlegmon,  acute  abscess. 

153.  Other  diseases  of  the  skin  and  annexa,  and  of  the  cellular  tissue. 

XIII.  Diseases  of  the  Bones  and  Organs  of  Locomotion 

154.  Osteomyelitis. 

155.  Other  diseases  of  the  bones  (tuberculosis  excepted). 

156.  Diseases  of  the  joints  and  other  organs  of  locomotion: 

(a)  Joints  (tuberculosis  and  rheumatism  excepted). 

(b)  Other  organs  of  locomotion. 
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XIV.  Congenital  Malformations 

157.  Congenital  malformation  (still-births  not  included): 

(a)  Congenital  hydrocephalus. 

(b)  Spina  bifida  and  meningocele. 

(c)  Congenital  malformation  of  the  heart. 

(d)  Monstrosities. 

( e ) Others. 

XV.  Diseases  of  Early  Infancy 

158.  Congenital  debility. 

159.  Premature  birth. 

160.  Injury  at  birth: 

(a)  Cesarean  operation. 

(b)  Without  Cesarean  operation. 

161.  Other  diseases  peculiar  to  early  infancy: 

(a)  Atelectasis. 

(b)  Icterus  of  the  newborn. 

(c)  Sclerema. 
id)  Others. 

XVI.  Senility 

162.  Senility: 

(a)  With  senile  dementia. 

(b)  Without  senile  dementia. 

XVII.  Violent  and  Accidental  Deaths 

All  violent  or  accidental  deaths  should  be  included  under  the  headings  163 
to  198  so  that  all  deaths  without  exception  are  included  under  one  or  other  of 
the  200  rubrics  of  the  list. 

For  the  deaths  included  in  numbers  176  to  195,  a second  independent  tabula- 
tion under  the  following  headings  is  obligatory: 

1.  Accidents  in  mines  and  quarries. 

2.  Accidents  caused  by  machinery. 

3.  Accidents  by  means  of  transportation: 

(a)  Railroads  and  street  cars. 

(b)  Automobiles,  motorcycles. 

(c)  Other  means  of  transportation  by  land. 

(d)  Transportation  by  water. 

(e)  Transportation  by  air. 

163.  Suicide  by  solid  or  liquid  poisons  or  by  absorption  of  corrosive  substances: 

(a)  Arsenic. 

(b)  Hydrocyanic  acid. 

( c ) Opium,  morphin,  laudanum. 

(d)  Strychnin. 

(e)  Corrosive  sublimate. 

(/)  Carbolic  acid. 

(g)  Lysol. 

(h)  Other  poisons  or  kind  not  stated. 

164.  Suicide  by  poisonous  gas. 
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165.  Suicide  by  hanging  or  strangulation. 

166.  Suicide  by  drowning. 

167.  Suicide  by  firearms. 

168.  Suicide  by  cutting  or  piercing  instruments. 

169.  Suicide  by  jumping  from  high  places. 

170.  Suicide  by  crushing. 

171.  Suicide  by  other  means. 

172.  Infanticide  (murder  of  infants  under  one  year): 

(a)  Immediately  after  birth. 

(b)  Others,  under  one  year. 

173.  Homicide  by  firearms  (persons  one  year  and  over). 

174.  Homicide  by  cutting  or  piercing  instruments  (persons  one  year  and  over). 

175.  Other  homicides  of.  persons  one  year  and  over. 

176.  Attack  by  venomous  animals. 

177.  Poisoning  by  food. 

178.  Accidental  absorption  of  poisonous  gas. 

179.  Other  acute  accidental  poisonings  (gas  excepted): 

(a)  Wood  alcohol. 

( b ) Denatured  alcohol. 

(c)  Carbolic  acid. 

( d ) Opium,  morphin,  laudanum. 

( e ) Strychnin. 

(/)  Other  poisons  or  kind  not  stated. 

180.  Conflagration. 

181.  Accidental  burns  (conflagration  excepted). 

182.  Accidental  mechanical  suffocation. 

183.  Accidental  drowning. 

184.  Accidental  traumatism  by  firearms  (wounds  of  war  excepted). 

185.  Accidental  traumatism  by  cutting  or  piercing  instruments  (wounds  of  war 

excepted). 

186.  Accidental  traumatism  by  fall,  crushing,  landslide: 

(a)  Fall  down  stairs. 

{b)  Fall  in  building  operations. 

( c ) Other  falls. 

(d)  Crushing,  landslide. 

187.  Cataclysm  (all  deaths  attributed  to  a cataclysm  regardless  of  their  nature) 

188.  Injuries  by  animals. 

189.  Hunger  or  thirst. 

190.  Excessive  cold. 

191.  Excessive  heat. 

192.  Lightning. 

193.  Accidents  due  to  electric  currents. 

194.  Other  accidents: 

(a)  Foreign  body. 

( b ) Others. 

195.  Violent  deaths  of  which  the  nature  (accident,  suicide,  homicide)  is  unknown. 

196.  Wounds  of  war. 

197.  Execution  of  civilians  by  belligerent  armies. 

198.  Legal  executions. 
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XVIII.  Ill-defined  Causes  of  Death 

199.  Sudden  death. 

200.  Cause  of  death  not  specified  or  ill-defined: 

(a)  Ill-defined. 

(b)  Not  specified  or  unknown. 

In  addition  to  the  detailed  list  as  given  above  the  Commission 
recommended,  in  its  1929  revision,  two  other  briefer  lists.  The 
first  of  these,  called  the  Intermediate  List , contains  85  titles.  The 
second,  called  the  Abridged  List , contains  43  titles. 

These  lists  are  as  follows: 

INTERMEDIATE  LIST 

(The  numbers  in  parentheses  are  those  of  the  detailed  list  given  above.) 

I.  Infectious  and  Parasitic  Diseases 

1.  Typhoid  fever  and  paratyphoid  fever  (1  and  2). 

2.  Typhus  fever  (3). 

3.  Smallpox  (6). 

4.  Measles  (7). 

5.  Scarlet  fever  (8). 

6.  Whooping-cough  (9). 

7.  Diphtheria  (10). 

8.  Influenza  (11). 

9.  Dysentery  (13). 

10.  Plague  (14). 

11.  Tuberculosis  of  the  respiratory  system  (23). 

12.  All  other  tuberculosis  (24  to  32  inclusive). 

13.  Syphilis  (34). 

14.  Purulent  infection,  septicemia,  non-puerperal  (36). 

15.  Malaria  (38). 

16.  Other  diseases  due  to  protozoa  or  helminths  (39  to  42  inclusive). 

17.  Other  infectious  and  parasitic  diseases*  (4,  5,  12,  15  to  22  inclusive,  33,  35, 

37,  43,  and  44). 

II.  Cancers  and  Other  Tumors 

18.  Cancer  and  other  malignant  tumors  (45  to  53  inclusive). 

19.  Non-malignant  tumors  (or  of  which  the  nature  is  not  specified)  (54  and  55). 

III.  Rheumatic  Diseases , Nutritional  Diseases , Diseases  of  the  Endocrine  Glands , 

and  Other  General  Diseases 

20.  Acute  rheumatic  fever  (56). 

21.  Chronic  rheumatism  and  gout  (57  and  58). 

* The  other  infectious  diseases  should  be  specified  when  they  cause  an 
appreciable  mortality,  and  certain  of  them  (cholera,  yellow  fever,  leprosy)  should 
be  specified  even  if  they  cause  only  a single  death. 
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22.  Diabetes  mellitus  (59). 

23.  Diseases  due  to  vitamin  deficiencies  (60  to  64  inclusive). 

24.  Diseases  of  the  thyroid  and  parathyroid  glands  (66). 

25.  Other  general  diseases  (65,  67  to  69  inclusive). 

IV.  Diseases  of  the  Blood  and  Blood-making  Organs 

26.  Pernicious  and  other  anemias  (71). 

27.  Leukemias,  pseudoleukemias,  and  other  diseases  of  the  blood  and  blood- 

making  organs  (70,  72  to  74). 

V.  Chronic  Poisonings  and  Intoxications 

28.  Alcoholism  (acute  or  chronic)  (75). 

29.  Chronic  poisoning  (76  and  77). 

VI.  Diseases  of  the  Nervous  System  and  of  the  Organs  of  Special  Sense 

30.  Meningitis  (79). 

31.  Progressive  locomotor  ataxia  (80). 

32.  Cerebral  hemorrhage,  cerebral  embolism  and  thrombosis  (82). 

33.  General  paralysis  of  the  insane  (83). 

34.  Dementia  praecox  and  other  psychoses  (84). 

35.  Epilepsy  (85). 

36.  Other  diseases  of  the  nervous  system  (78,  81,  86,  and  87). 

37.  Diseases  of  the  eye,  the  ear,  and  the  annexa  (88  and  89). 

VII.  Diseases  of  the  Circidatory  System 

38.  Pericarditis  (90). 

39.  Acute  endocarditis  (91). 

40.  Chronic  endocarditis,  valvular  diseases  (92). 

41.  Diseases  of  the  myocardium  (93). 

42.  Diseases  of  the  coronary  arteries,  and  angina  pectoris  (94). 

43  Other  diseases  of  the  heart  (95). 

44.  Aneurysm  (except  of  the  heart)  (96). 

45.  Arteriosclerosis  and  gangrene  (97  and  98). 

46.  Other  diseases  of  the  circulatory  system  (99  to  103  inclusive). 

VIII.  Diseases  of  the  Respiratory  System 

47.  Bronchitis  (106). 

48.  Pneumonia  (107  to  109  inclusive). 

49.  Pleurisy  (110). 

50.  Other  diseases  of  the  respiratory  system  (tuberculosis  excepted)  (104  and 

105,  111  to  114  inclusive). 

IX.  Diseases  of  the  Digestive  System 

51.  Ulcer  of  the  stomach  and  duodenum  (117). 

52.  Diarrhea  and  enteritis  (under  two  years  of  age)  (119). 

53.  Diarrhea,  enteritis,  and  ulceration  of  intestines  (two  years  and  over)  (120). 

54.  Appendicitis  (121). 
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55.  Hernia,  intestinal  obstruction  (122). 

56.  Cirrhosis  of  the  liver  (124). 

57.  Other  diseases  of  the  liver  and  biliary  passages  (including  biliary  calculi) 

(125  to  127  inclusive). 

58.  Other  diseases  of  the  digestive  system  (115,  116,  118,  123,  128,  and  129). 

X.  Diseases  of  the  Genito-urinary  System 

59.  Nephritis  (130  to  132  inclusive). 

60.  Other  diseases  of  the  kidneys  and  ureters  (puerperal  diseases  excepted)  (133). 

61.  Calculi  of  the  urinary  passages  (134). 

62.  Diseases  of  the  bladder  (tumors  excepted)  (135). 

63.  Diseases  of  the  urethra,  urinary  abscess,  etc.  (136). 

64.  Diseases  of  the  prostate  (137). 

65.  Diseases  of  the  genital  organs — not  specified  as  venereal  (138  and  139). 

XI.  Diseases  of  Pregnancy,  Childbirth,  and  the  Puerperal  State 

66.  Accidents  of  pregnancy  (141,  142,  143). 

67.  Puerperal  hemorrhage  (144). 

68.  Septicemia  and  puerperal  infection  (140,  145). 

69.  Toxemias  of  pregnancy  (albuminuria  and  eclampsia)  (146  and  147). 

70.  Other  puerperal  diseases  (148  to  150  inclusive). 

XII.  Diseases  of  the  Skin  and  Cellular  Tissue 

71.  Diseases  of  the  skin  and  cellular  tissue  (151  to  153  inclusive). 

XIII.  Diseases  of  the  Bones  and  Organs  of  Locomotion 

72.  Diseases  of  the  bones  and  of  the  organs  of  locomotion  (tuberculosis  and  rheu- 

matism excepted)  (154  to  156  inclusive). 

XIV.  Congenital  Malformations 

73.  Congenital  malformations  (still-births  not  included)  (157). 

XV.  Diseases  of  Early  Infancy 

74.  Congenital  debility  (158). 

75.  Premature  birth  (159). 

76.  Injury  at  birth  (160). 

77.  Other  diseases  peculiar  to  early  infancy  (161). 

XVI.  Senility 

78.  Senility  (162). 

XVII.  Violent  and  Accidental  Deaths 

79.  Suicide  (163  to  171  inclusive). 

80.  Homicide  (172  to  175  inclusive). 

81.  Accident  (176  to  194  inclusive). 
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82.  Violent  deaths  of  which  the  nature  (accident,  suicide,  homicide)  is  unknown 

(195). 

83.  Wounds  of  war  (including  execution  of  civilians  by  belligerent  armies)  (196 

and  197). 

84.  Legal  executions. 

XVIII.  Ill-defined  Causes  of  Death 

85.  Cause  of  death  not  specified,  or  ill  defined  (199  and  200). 

ABRIDGED  LIST 

(The  numbers  in  parentheses  are  those  of  the  Detailed  List.) 

I 

1.  Typhoid  fever  and  paratyphoid  fever  (1  and  2). 

2.  Typhus  fever  (3). 

3.  Smallpox  (6). 

4.  Measles  (7). 

5.  Scarlet  fever  (8). 

6.  Whooping-cough  (9). 

7.  Diphtheria  (10). 

8.  Influenza  (11). 

9.  Plague  (14). 

10.  Tuberculosis  of  the  respiratory  system  (23). 

11.  All  other  tuberculosis  (24  to  32  inclusive). 

12.  Syphilis  (34). 

13.  Malaria  (38). 

14.  Other  infectious  and  parasitic  diseases  (4,  5,  12,  13,  15  to  22  inclusive,  33, 

35  to  37  inclusive,  39  to  44  inclusive).  (See  note  to  corresponding  item  in 
Intermediate  List.) 

II 

15.  Cancer  and  other  malignant  tumors  (45  to  53  inclusive). 

16.  Non-malignant  tumors  (or  of  which  the  nature  is  not  specified)  (54  and  55). 

Ill,  IV,  V,  and  VI 

17.  Chronic  rheumatism  and  gout  (57  and  58). 

18.  Diabetes  mellitus  (59). 

19.  Alcoholism  (acute  or  chronic)  (75). 

20.  Other  general  diseases  and  chronic  poisonings  (56,  60  to  74  inclusive,  76 

and  77). 

21.  Progressive  locomotor  ataxia  and  general  paralysis  of  the  insane  (80,  83). 

22.  Cerebral  hemorrhage,  cerebral  embolism  and  thrombosis  (82). 

23.  Other  diseases  of  the  nervous  system  and  of  the  organs  of  special  sense  (78, 

79,  81,  84  to  89  inclusive). 

VII 

24.  Diseases  of  the  heart  (90  to  95  inclusive). 

25.  Other  diseases  of  the  circulatory  system  (96  to  103  inclusive). 
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26. 

27. 

28. 


29. 

30. 

31. 

32. 


33. 

34. 


35. 

36. 


37. 


38. 

39. 

40. 

41. 

42. 

43. 


VIII 

Bronchitis  (106). 

Pneumonia  (all  forms)  (107  to  109  inclusive). 

Other  diseases  of  the  respiratory  system  (tuberculosis  excepted)  (104  and 
105,  110  to  114  inclusive). 

IX 

Diarrhea  and  enteritis  (119  and  120). 

Appendicitis  (121). 

Diseases  of  the  liver  and  biliary  passages  (124  to  127  inclusive). 

Other  diseases  of  the  digestive  system  (115  to  118  inclusive,  122,  123,  128, 
and  129). 

X 

Nephritis  (130  to  132). 

Other  diseases  of  the  genito-urinary  system  (133  to  139  inclusive). 


XI 

Septicemia  and  puerperal  infection  (140  and  145). 

Other  diseases  of  pregnancy,  of  childbirth,  and  of  the  puerperal  state  (141  to 
144  inclusive,  146  to  150  inclusive). 

XII  and  XIII 

Diseases  of  the  skin,  of  the  cellular  tissue,  of  the  bones,  and  of  the  organs  of 
locomotion  (151  to  156  inclusive). 


XIV  and  XV 


Congenital  debility,  congenital  abnormalities,  premature  birth,  etc.  (157  to 
161  inclusive). 

XVI 


Senility  (162). 


XVII 

Suicide  (163  to  171  inclusive). 

Homicide  (172  to  175  inclusive). 

Violent  or  accidental  death  (except  suicide  and  homicide)  (176  to  198  in- 
clusive). 


XVIII 


Cause  of  death  not  specified,  or  ill  defined  (199  and  200). 


RECOMMENDATIONS  OF  THE  INTERNATIONAL  COMMISSION 

Certain  recommendations  made  by  the  Commission  in  connec- 
tion with  the  1929  revision  are  of  general  interest.  The  following 
notes  are  free  renderings  of  and  running  comments  upon  the  sense 
of  certain  items  in  the  Prods  Verbaux,  made  in  advance  of  their 
definitive  publication,  and  therefore  not  official. 
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The  Commission  regards  it  as  a matter  of  first  importance 
that  serious  efforts  should  be  made  in  every  country  to  give  special 
instruction  to  practitioners  and  students  of  medicine  regarding  the 
principles  according  to  which  death  certificates  should  be  filled 
out.  This  is  a sound  recommendation.  It  can  probably  be  regarded 
as  certain  that  if  the  medical  schools  in  this  country  gave  attention 
seriously  to  this  matter  the  quality  of  our  vital  statistics  in  respect 
of  the  causes  of  death  would  be  measurably  improved  within  a 
decade. 

In  regard  to  the  use  of  the  International  List  the  Commission 
recommends  that  countries  not  in  a position  to  apply  the  Detailed 
List  in  all  its  subdivisions  should  nevertheless  adhere  to  the  con- 
vention and  furnish  figures  for  groups  of  causes  of  deaths,  which 
should  not  be  more  condensed  than  those  of  the  Intermediate  List. 

It  is  recommended  that  death  certificates  of  persons  dying  after 
a surgical  operation  contain  statements  as  to  the  morbid  condition 
leading  to  surgical  intervention  and  the  nature  of  the  surgical 
operation  performed.  Such  procedure  would  certainly  enhance  the 
value  of  the  returns  for  the  student. 

In  view  of  the  great  sociological  interest  and  importance  of 
industrial  (occupational)  accident  mortality  two  recommendations 
are  made:  (1)  That  the  death  certificate  shall  state,  with  the 
maximum  attainable  precision,  the  last  occupation  followed  by  the 
deceased,  and  (2)  that  governments  should  consider  whether,  in 
addition  to  the  information  now  given  on  death  certificates,  there 
should  not  also  be  a specific  statement  as  to  whether  the  accident 
leading  to  death  is,  or  is  not,  to  be  regarded  as  occupational,  at 
least  in  the  case  of  the  principal  rubrics  under  accidental  deaths. 

STILL-BIRTHS  AND  MORBIDITY 

In  connection  with  the  1929  revision  the  International  Com- 
mission prepared  a brief  list  of  causes  of  still-births  (mortinatalite) , 
as  follows: 

I.  Death  of  the  Fetus  During  Gestation 

1.  Syphilis  and  other  chronic  diseases. 

2.  Toxemia  of  pregnancy  (eclampsia,  albuminuria,  retroplacental  hemorrhage). 

3.  Malformation  incompatible  with  life. 

4.  Other  causes  and  causes  not  specified. 
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II.  Deaths  from  Premature  Birth 

5.  Maternal  overwork. 

6.  Traumatism  producing  premature  labor. 

7.  Placenta  previa. 

8.  Acute  infection. 

9.  Chronic  infection,  particularly  syphilis. 

10.  Other  causes,  and  causes  not  specified. 

III.  Death  of  the  Fetus  During  Parturition 

11.  Abnormal  presentations. 

12.  Obstacles  to  the  expulsion  of  the  child. 

13.  Other  causes  and  causes  not  specified. 

The  Commission  has  made  another  departure  in  codifying  the 
names  of  diseases,  as  distinguished  from  the  nomenclature  of  the 
causes  of  death.  The  French  text  follows: 

NOMENCLATURE  DES  MALADIES 

La  nomenclature  des  maladies  ne  differe  de  la  nomenclature  des  causes  de 
deces  que  par  la  subdivision  de  quelques  rubriques,  designees  par  des  lettres  cap- 
itals, A,  B,  C,  etc. 

On  ne  reproduira  ici  que  les  rubriques  ainsi  subdivisees. 

34.  Syphilis. 

A.  Congenitale. 

B.  Acquise. 

1.  Primaire. 

2.  Secondaire. 

3.  Tertiaire. 

C.  Non-specifiee. 

35.  Gonococcie  et  autres  maladies  veneriennes. 

A.  Infections  gonococciques  (excepte  ophtalmie). 

B.  Ophtalmie  gonococcique. 

C.  Autres  maladies  veneriennes. 

43.  Mycoses. 

A.  Teignes,  trichophytie  et  favus. 

B.  Autres  mycoses. 

88.  Maladies  des  organes  de  la  vision. 

A.  Conjonctivite. 

B.  Keratite. 

C.  Iritis. 

D.  Cataracte. 

E.  Retinite. 

F.  Glaucome. 

G.  Autres. 

115.  Maladies  de  la  cavite  buccale. 

A.  Maladies  des  dents  ou  des  gencives. 

B.  Autres. 
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149.  Autres  accidents  de  l’accouchement. 

Bien  qu’il  ne  s’agisse  pas  de  maladie,  line  rubrique  “accouchement 
normal”  est  necessaire  pour  la  statistique  des  personnes  presentes  dans 
les  hopitaux,  maternites,  etc. 

A.  Accouchement  normal. 

B.  Accidents  de  l’accouchement. 

153.  Autres  maladies  de  la  peau,  de  ses  annexes  et  du  tissu  cellulaire. 

A.  Pelade. 

B.  Autres  maladies. 

158.  Debilite  congenitale. 

Bien  qu’il  ne  s’agisse  pas  de  malades,  une  rubrique  “nouveau-nes 
sortis  de  1’hopital  ou  de  la  maternite  sans  avoir  ete  malades”  est 
necessaire  pour  le  statistique  des  personnes  presentes  dans  les  hopitaux, 
maternites,  etc. 

A.  Nourrissons  sortis  de  l’hopital  sans  avoir  ete  malades. 

B.  Debilite  congenitale. 

194.  Autres  accidents. 

A.  Corps  etranger. 

B.  Luxation. 

C.  Entorse. 

D.  Fracture  (sans  autre  indication). 

E.  Plaie. 

F.  Autres. 

200.  Causes  non  specifiees  ou  mal  definies. 

A.  Causes  non  specifiees  ou  mal  definies. 

B.  Surmenage. 

C.  Simulation,  malade  en  observation. 

Bien  qu’il  ne  s’agisse  pas  de  maladie  veritable,  une  rubrique  “sim- 
ulation” est  necessaire  pour  la  statistique  des  personnes  ayant  sejourne 
dans  un  hopital,  une  maison  de  sante,  etc. 

THE  OFFICIAL  STATISTICAL  TREATMENT  OF  JOINT  CAUSES  OF  DEATH 

Few  persons  not  professional  vital  statisticians  understand  the 
real  meaning  of  mortality  statistics  tabled  under  the  International 
Classification.  The  official  charged  with  compiling  such  statistics 
has  to  work  under  a set  of  essentially  arbitrary  rules.  Otherwise 
he  never  could  make  an  intelligent  compilation,  because  of  two 
important  facts: 

1.  Some  physicians  all  the  time,  and  all  physicians  some  of  the 
time,  will  use  their  own  terminology  instead  of  that  of  the  Inter- 
national Classification  in  reporting  the  cause  of  death  on  the 
original  death  certificate. 

2.  Physicians  will,  quite  properly,  report  more  than  one  morbid 
condition  as  a causal  factor  in  the  death. 
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What  shall  the  vital  statistician  do  under  such  premises? 
What  he  actually  does  do  is  so  important  for  a right  understanding 
of  what  official  vital  statistics  of  the  present  day  really  mean 
medically , that  it  seems  desirable  to  reproduce  here,  in  part,  the 
excellent  discussion  of  the  matter  contained  in  the  last  issued 
“Manual  of  the  International  List.”  This  discussion  shows  the 
general  principles  according  to  which  causes  of  death  are  handled 
in  modern  statistical  offices.  From  time  to  time  some  slight 
modifications  in  respect  of  details  are  made.  Discussions  of  these 
modifications  and  accounts  of  the  procedure  under  the  rules  are 
embodied  each  year  in  the  textual  matter  of  the  annual  volumes  of 
Mortality  Statistics  from  the  Census  Bureau.  Here  we  are  only 
concerned  with  general  principles. 

The  expression  “joint  causes  of  death”  is  a convenient  one  for 
those  cases  in  which  the  physician  reports  two  or  more  causes  or 
conditions  upon  the  certificate  of  death  of  an  individual.  According 
to  the  general  practice  of  statistical  compilation  only  one  cause 
can  be  tabulated  for  each  death,  consequently  a process  of  selection 
is  necessary.  The  method  employed  for  this  purpose  may  have  a 
very  considerable  influence  upon  the  resulting  statistics.  Dr. 
Julius  J.  Pikler*  has  very  forcefully  directed  attention  to  the  impor- 
tance of  the  study  of  contributory  causes  of  death  that  usually  are 
lost  entirely  in  compilation,  but  the  full  statement  of  such  causes 
would  be  difficult,  especially  for  related  tables  and  a detailed 
classification,  in  a report  dealing  with  large  numbers  of  returns. 

The  International  Commission  did  not  give  special  consideration 
to  this  subject  in  1909,  but  at  the  suggestion  of  Dr.  Bertillon  it  was 
agreed  that  the  rules  employed  since  1900  should  be  continued  in 
force  and  a special  committee  was  appointed  to  report  on  the  sub- 
ject. Following  are  the  rules  in  question  as  given  in  the  French 
edition  of  1903: 

1.  If  one  of  the  two  diseases  is  an  immediate  and  frequent  complication  of  the 
other,  the  death  should  be  classified  under  the  head  of  the  primary  disease.  Examples: 

Infantile  diarrhea  and  convulsions , classify  as  infantile  diarrhea. 

Measles  and  bronchopneumonia , classify  as  measles. 

Scarlet  fever  and  diphtheria,  classify  as  scarlet  fever . 

Scarlet  fever  and  nephritis,  classify  as  scarlet  fever. 

* Das  Budapester  System  der  Todesursachenstatistik,  1909. 
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2.  If  the  preceding  rule  is  not  applicable,  the  following  should  be  used:  If  one  of 
the  diseases  is  surely  fatal*  and  the  other  is  of  less  gravity,  the  former  should  be 
selected  as  the  cause  of  death.  Examples: 

Cancer  and  bronchopneumonia,  classify  as  cancer. 

Pulmonary  tuberculosis  and  puerperal  septicemia,  classify  as  tuberculosis. 

I denis  gravis  and  pericarditis,  classify  as  icterus  gravis. 

3.  If  neither  of  the  above  rules  is  applicable,  then  the  following:  If  one  of  the 
diseases  is  epidemic  and  the  other  is  not,  choose  the  epidemic  disease.  Examples: 

Typhoid  fever  and  saturnism,  classify  as  typhoid  fever . 

Measles  and  biliary  calculi,  classify  as  measles. 

4.  If  none  of  the  three  preceding  rules  is  applicable,  the  following  may  be  used: 
If  one  of  the  diseases  is  much  more  frequently  fatal  than  the  other,  then  it  should  be 
selected  as  the  cause  of  death.  Examples: 

Rheumatism  ( without  metastasis ) and  salpingitis , classify  as  salpingitis . 

Pericarditis  and  appendicitis,  classify  as  pericarditis . 

5.  If  none  of  the  four  preceding  rules  applies,  then  the  following:  If  one  of  the  dis- 
eases is  of  rapid  development  and  the  other  is  of  slow  development,  the  disease  of  rapid 
development  should  be  taken.  Examples: 

Diabetes  and  icterus  gravis,  classify  as  icterus  gravis. 

Cirrhosis  and  angina  pectoris,  classify  as  angina  pectoris. 

Pleurisy  and  senile  debility,  classify  as  pleurisy. 

6.  If  none  of  the  above  five  rules  applies,  then  the  diagnosis  should  be  selected 
that  best  characterizes  the  case.  Example: 

Saturnism  and  peritonitis,  classify  as  saturnism. 

Precise  diagnoses  should  be  given  the  preference  over  vague  and  indeterminate 
ones,  such  as  “Hemorrhage,”  “Encephalitis,”  etc.  Arbitrary  decisions  should  be 
avoided  as  much  as  possible  by  the  use  of  the  preceding  rules.  None  of  them  is 
absolute,  but  all  are  subject  to  exceptions  which  may  vary  according  to  local  usages.f 
In  practice  the  first  rule,  which  is  the  most  logical  of  all,  is  the  one  of  most  frequent 
application.  The  others  have  been  formulated  only  to  prepare  for  all  cases  and  to 
treat  them  with  system  and  uniformity. 

These  rules  differ  but  slightly  from  those  given  in  the  Manual  of 
1902,  which  were  based  upon  the  French  edition  of  1900.  They 
are  a development  of  practical  experience,  as  shown  by  the  forms 
in  which  they  have  appeared  in  various  editions  of  the  International 

* Apart  from  all  treatment.  This  provision  is  necessary  to  assure  stability  in  the 
application  of  the  rules.  Otherwise  a therapeutic  discovery,  for  example,  that  of  the 
antidiphtheric  serum,  would  modify  the  tables  and  injure  the  comparability  of  the 
statistics. 

f Particularly  we  should  note  the  impropriety  of  certain  expressions.  For  example, 
if  a physician  writes  Typhoid  fever,  chronic  nephritis,  it  is  almost  certain  that  he  in- 
tended to  indicate  typhoid  fever  complicated  with  albuminuria  and  not  a patient 
with  Bright’s  disease  attacked  with  typhoid  fever. 

When  a disease  ordinarily  rare  or  absent  undergoes  a large  extension  (e.  g.,  cholera, 
yellow  fever,  etc.)  the  total  deaths  should  be  noted  without  any  exception  whatever. 
For  such  cases  it  is  necessary  to  waive  all  ordinary  rules. 
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CJassification,  and  may  be  compared  with  the  rules  given  in  the 
introductory  text  of  the  Alphabetische  Liste  von  Krankheiten  und 
Todesursachen,  Kaiserliches  Gesundheitsamt,  Germany,  1905: 

When  several  diseases  are  reported  as  causes  of  death,  the  following  rules  should 
be  observed: 

1.  The  death  is,  as  a rule,  to  be  assigned  to  that  number  which  represents  the 
probable  primary  cause  (Grundleiden).  For  example,  when  nephritis  and  valvular 
heart  disease  are  returned,  the  death  should  be  classified  under  the  heart  disease  as 
the  probable  primary  cause.  Only  when  the  primary  cause  is  not  a real  disease  may 
it  be  disregarded.  For  example,  with  “senile  debility  and  bronchitis”  or  “debility 
and  intestinal  catarrh,”  the  deaths  should  be  classified  not  as  senile  debility  or  con- 
genital debility,  but  as  chronic  bronchitis  and  as  intestinal  catarrh. 

2.  With  two  independent  diseases,  the  more  severe  should  be  chosen. 

3.  With  an  infectious  disease  and  a non-inf ectious  disease,  the  former  should  be 
chosen.  Example:  Insanity  and  typhoid  fever,  classify  as  typhoid  fever. 

4.  If  acute  diseases  are  reported  with  chronic  diseases,  the  acute  diseases  are  to 
be  preferred.  Example:  Gastric  ulcer  and  croupous  pneumonia,  classify  as  croupous 
pneumonia. 

5.  If  two  infectious  diseases  are  reported  as  causes  of  death,  then  smallpox,  scarlet 
fever,  measles,  typhus  fever,  diphtheria  and  croup,  whooping-cough,  croupous  pneu- 
monia, influenza,  typhoid  fever,  paratyphoid  fever,  Weil’s  disease,  relapsing  fever, 
cerebrospinal  fever,  erysipelas,  tetanus,  septicemia,  puerperal  fever,  plague,  Asiatic 
cholera,  dysentery,  anthrax,  glanders,  rabies,  and  trichiniasis  should  have  the  pref- 
erence over  tuberculosis,  malaria,  or  a venereal  disease. 

6.  Causes  of  death  from  violence  are  usually  preferred. 

7.  Such  returns  as  heart  weakness  [“heart  failure”],  cardiac  paralysis,  paralysis 
of  the  lungs,  pulmonary  edema,  coma,  and  the  like,  should  be  disregarded  if  other 
causes  are  named. 

8.  With  tuberculosis  of  several  organs,  including  that  of  the  lungs,  tuberculosis 
of  the  lungs  should  be  selected. 

It  will  be  interesting  also  to  compare  the  rules  published  by  the 
Society  of  Medical  Officers  of  Health  of  England*: 

Rules  as  to  Classification  of  Causes  of  Death 

With  the  following  exceptions  the  general  rule  should  be  to  select  from  several 
diseases  mentioned  in  the  certificate  the  disease  of  the  longest  duration.  In  the  event 
of  no  duration  being  specified,  the  disease  standing  first  in  order  should  be  assumed 
to  be  the  disease  of  longest  duration. 

Exceptions  to  the  Above  Rule 

Any  one  of  the  chief  infective  diseases  should  be  selected  in  preference  to  any  other 
cause  of  death.  If  two  infective  diseases  in  succession  be  specified,  the  disease  of 
longer  duration  should  be  selected. 

* The  New  Tables  Issued  by  the  Local  Government  Board  and  the  Schedules  of 
Causes  of  Death  issued  by  The  Incorporated  Society  of  Medical  Officers  of  Health, 
London,  1901. 
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Thus  scarlet  fever  should  be  selected  in  preference  to  bronchopneumonia,  and 
phthisis  in  preference  to  bronchitis. 

Definite  diseases,  ordinarily  known  as  constitutional  diseases , should  have  pref- 
erence over  those  known  as  local  diseases. 

Thus  cancer  should  be  selected  in  preference  to  pneumonia,  and  diabetes  in 
preference  to  heart  disease. 

When  apoplexy  occurs  in  conjunction  with  definite  disease  of  the  heart  or  kidneys, 
the  heart  disease  or  the  kidney  disease,  as  the  case  may  be,  should  be  preferred. 

When  hemiplegia  is  mentioned  in  connection  with  embolism , the  embolism  should 
be  selected. 

When  embolism  occurs  in  connection  with  childbirth , the  death  should  be  referred 
to  accidents  of  childbirth. 

In  calculating  the  death-rate  from  “diarrhea,”  deaths  certified  as  due  to  diarrhea, 
either  alone  or  coupled  with  some  ill-defined  cause  (such  as  “atrophy,”  “debility,” 
“marasmus,”  “thrush,”  “convulsions,”  “teething,”  “old  age,”  or  “senile  decay”), 
epidemic  or  summer  diarrhea,  epidemic  or  zymotic  enteritis,  intestinal  or  enteric  catarrh, 
gastro -intestinal  or  gastro-enteric  catarrh , dysentery  or  dysenteric  diarrhea,  cholera  (not 
being  “Asiatic  cholera”),  cholera  nostras,  cholera  infantum,  and  choleraic  diarrhea 
should  be  included. 

The  following  miscellaneous  examples  are  given  as  indicating  the  method  of 
classification  in  cases  of  difficulty  that  frequently  arise: 


Causes  of  Death  in  Order  Given  in  Death 
Certificate 

Whooping-cough,  bronchopneumonia, 
scarlet  fever. 

Scarlet  fever  six  months,  otitis  media, 
abscess  of  brain. 

Laryngeal  and  pulmonary  phthisis. 
Pneumonia,  old  age. 

Old  age,  bronchitis. 

Phthisis,  diabetes  mellitus. 

Diphtheria  nine  months,  paralysis. 
Puerperal  perimetritis. 

Cerebral  embolism. 

Spasmodic  croup. 

Acute  hydrocephalus. 

Bronchitis,  phthisis. 


To  be  Classified  Under — 

Whooping-cough,  i of  longer  duration 
than  scarlet  fever. 

Scarlet  fever. 

Phthisis. 

Pneumonia. 

Bronchitis. 

Select  disease  of  longest  duration. 
Diphtheria. 

Puerperal  fever. 

Embolism. 

Laryngismus  stridulus. 

Tubercular  meningitis. 

Phthisis. 


Through  the  kindness  of  Dr.  John  Tatham,  formerly  Medical 
Superintendent  of  the  Registrar- General’s  office,  England,  a copy  of 
the  Instructions  to  Abstractors,  as  employed  in  that  office  in  1909, 
was  supplied  to  the  Bureau  of  the  Census.  Certain  decisions  of 
special  interest  are  taken  therefrom: 


1.  Any  general  disease  (except  pyrexia,  premature  birth,  congenital  defects,  want 
of  breast  milk,  teething,  and  chronic  rheumatism)  to  be  taken  in  preference  to  any 
local  disease  except  aneurysm  and  strangulated  hernia. 
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2.  Any  of  the  following  diseases  are  to  be  given  preference  over  any  other  dis- 
eases: Aneurysm,  anthrax,  Asiatic  cholera,  cancer,  carcinoma,  glanders,  rabies, 
industrial  poisoning,  malignant  disease,  opium  or  morphin  habit,  puerperal  septic 
disease,  sarcoma,  smallpox,  strangulated  hernia,  tetanus,  and  vaccination. 

3.  Any  disease  in  this  group  is  to  be  preferred  over  any  other  disease  except  those 
named  in  the  preceding  group:  Cerebrospinal  fever,  diphtheria,  dysentery,  typhoid 
fever,  German  measles,  malaria,  measles,  mumps,  relapsing  fever,  scarlet  fever, 
typhus  fever,  and  whooping-cough. 

4.  The  following  diseases  to  be  preferred  except  for  those  named  in  the  two 
preceding  lists:  Acute  hydrocephalus,  alcoholism,  influenza,  lupus,  phthisis,  pul- 
monary tuberculosis,  rheumatic  fever  (acute  and  subacute  rheumatism),  scrofula,  syph- 
ilis, tabes  mesenterica,  tuberculous  meningitis,  tuberculous  peritonitis,  tuberculosis  of 
other  organs,  and  general  tuberculosis. 

5.  For  the  following  list,  prefer  the  disease  of  longer  duration  or  the  disease 
first  written:  Carbuncle  (not  anthrax),  diabetes  mellitus,  epidemic  diarrhea,  epi- 
demic enteritis,  enteritis,  diarrhea  due  to  food,  erysipelas,  gout,  hemophilia,  infective 
endocarditis,  infective  enteritis,  pernicious  anemia,  phagedena,  phlegmon  (not  an- 
thrax), pneumonia  (all  forms),  purpura  haemorrhagica,  pyemia  (not  puerperal),  rheu- 
matoid arthritis,  rheumatic  gout,  rheumatism  of  heart,  rickets,  scurvy,  septicemia, 
other  septic  diseases,  septic  infections,  starvation,  and  varicella. 

6.  Premature  birth  and  congenital  defects  (malformations)  to  be  preferred  for 
decedents  under  three  months  of  age  to  other  causes  except  those  of  groups  2 and  3. 

7.  Chlorosis  and  anemia  (not  pernicious)  only  when  alone. 

8.  For  combinations  of  local  diseases,  usually  select  disease  of  longer  duration  or 
that  first  written. 

9.  Any  definite  disease  accelerated  by  violence  is  to  be  classed  to  the  disease. 

10.  Tetanus,  septicemia,  blood-poisoning,  pyemia,  or  erysipelas  following  violence 

to  be  classed  to  tetanus  or  the  septic  disease  if  the  injury  is  slight;  but  if  severe  enough 
to  kill  by  itself,  the  death  should  be  classed  to  the  form  of  violence. 

For  returns  upon  the  Standard  Certificate  of  Death,  and  espe- 
cially for  those  returns  in  which  the  instructions  have  been  regarded 
by  the  reporting  physicians,  the  following  suggestions  are  made  by 
the  United  States  Bureau  of  the  Census: 

1.  Select  the  primary  cause,  that  is,  the  real  or  underlying  cause  of  death.  This  is 
usually — 

(a)  The  cause  first  in  order. 

(b)  The  cause  of  longer  duration.  If  the  physician  writes  the  cause  of  shorter 

duration  first,  inquiry  may  be  made  whether  it  is  not  a mere  symptom, 
complication,  or  terminal  condition. 

(c)  The  cause  of  which  the  contributory  (secondary)  cause  is  a frequent  com- 

plication. 

(ft)  The  physician  may  indicate  the  relation  of  the  causes  by  words,  although 
this  is  a departure  from  the  way  in  which  the  blank  was  intended  to  be 
filled  out.  For  example,  “Bronchopneumonia  following  measles”  (pri- 
mary cause  last)  or  “Measles  followed  by  bronchopneumonia”  (primary 
cause  first). 
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2.  If  the  relation  of  primary  and  secondary  is  not  clear,  prefer  general  diseases,, 
and  especially  dangerous  infective  or  epidemic  diseases,  to  local  diseases. 

3.  Prefer  severe  or  usually  fatal  diseases  to  mild  diseases. 

4.  Disregard  ill -defined  causes  (Class  XVIII),  and  also  indefinite  and  ill-defined 
terms  (e.  g.,  “debility,”  “atrophy”)  in  Classes  XIV  and  XV  that  are  referred,  for  cer- 
tain ages,  to  Class  XVIII,  as  compared  with  definite  causes.  Neglect  mere  modes  of 
death  (failure  of  heart  or  respiration)  and  terminal  symptoms  or  conditions  ( e . g., 
hypostatic  congestion  of  lungs). 

5.  Select  homicide  and  suicide  in  preference  to  any  consequences,  and  severe 
accidental  injuries,  sufficient  in  themselves  to  cause  death,  to  all  ordinary  conse- 
quences. Tetanus  is  preferred  to  any  accidental  injury,  and  erysipelas,  septicemia, 
pyemia,  peritonitis,  etc.,  are  preferred  to  less  serious  accidental  injuries.  Prefer 
definite  means  of  accidental  injury  (e.  g.,  railway  accident,  explosion  in  coal  mine, 
etc.)  to  vague  statements  or  statement  of  the  nature  of  the  injury  only  (e.  g.,  accident, 
fracture  of  skull). 

6.  Physical  diseases  (e.  g.,  tuberculosis  of  lungs,  diabetes)  are  preferred  to  mental 
diseases  as  causes  of  death  (e.  g.,  manic  depressive  psychosis),  but  general  paralysis 
of  the  insane  is  a preferred  term. 

7.  Prefer  puerperal  causes  except  when  a serious  disease  (e.  g.,  cancer,  chronic 
Bright’s  disease)  was  the  independent  cause. 

8.  Disregard  indefinite  terms  and  titles  generally  in  favor  of  definite  terms  and 
titles.  The  precise  line  of  demarcation  is  difficult  to  lay  down,  but  may  be  indicated 
broadly  by  the  kinds  of  type  employed  in  the  International  List  in  the  form  distrib- 
uted by  the  Census  to  all  physicians  in  the  United  States.* 

From  these  suggestions  and  from  the  instructions  employed  in 
various  offices  it  will  be  apparent  that  there  is  a considerable  factor 
of  uncertainty  in  the  results  when  a large  proportion  of  joint  causes 
is  involved.  No  rules  yet  formulated  will  insure  absolutely  iden- 
tical compilations  from  the  same  material,  and  the  methods  em- 
ployed in  the  same  office  may  vary  from  year  to  year.  The  most 
efficient  editor  is  not  the  one  who  follows  any  set  of  listed  arbitrary 
decisions,  but  rather  the  one  who  is  constantly  on  the  lookout  for 
cases  in  which  it  should  not  be  followed,  and  who  calls  attention  to 
such  cases.  A list  of  this  kind  cannot  incorporate  considerations 
of  duration,  sex,  place  of  death,  age,  occupation,  etc.,  any  or  all  of 
which  may  have  an  important  bearing  upon  the  classification  of 
deaths,  and  in  individual  cases  such  data  on  transcripts  often 
indicate  an  assignment  contrary  to  the  listed  one. 

The  whole  subject  of  joint  causes  is  a difficult  one,  and  there  is 
still  no  international  agreement  about  the  matter.  The  Inter- 
national Commission,  however,  refuses  to  recommend  the  formation 

* See  Physicians’  Pocket  Reference  to  the  International  List  of  Causes  of  Death. 
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of  any  general,  uniform,  international  code  for  joint  causes.  This 
position  is  based  upon  arguments  which  seem  to  many  competent 
vital  statisticians  singularly  narrow  and  lacking  in  any  deep  under- 
standing of  the  basic  philosophy  and  psychology  underlying  the 
recording  and  tabulating  of  the  causes  of  death.  It  is  generally 
admitted  that  the  work  of  the  United  States  Census  Bureau  is  the 
most  advanced,  on  this  matter,  of  any  country  in  the  world.  And 
yet,  as  the  late  Dr.  William  H.  Davis  has  said,  “the  treatment  of 
joint  causes  of  death  has  never  been  adequately  discussed  by  inter- 
national conferences,  nor  been  adequately  treated  by  anybody.7 7 

RELIABILITY  OF  STATISTICS  OF  SEPARATE  CAUSES  OF  DEATH 

Philosophically  considered  a true  determination  of  the  “cause 
of  death7’  is  in  a great  many  cases,  indeed  the  majority  probably  of 
all  cases,  an  extraordinarily  difficult  matter.  This  every  patho- 
logic anatomist  knows.  The  difficulty  arises  from  many  different 
circumstances.  Some  illustrations  will  perhaps  make  the  point 
clear.  A woman  has  cancer  of  the  breast,  is  operated  upon  in  the 
hope  of  curing  this  disease,  develops  a postoperative  pneumonia, 
and  dies.  Now  if  the  women  had  not  had  the  cancer  and  had  there- 
fore not  been  operated  on  for  its  relief,  this  train  of  circumstances 
would  not  have  got  under  way.  This  way  of  looking  at  the  matter 
plainly  suggests  that  the  cancer  is  fundamentally  the  cause  of 
this  death.  But,  on  the  other  hand,  if  she  had  not  been  operated 
on,  even  though  she  still  had  the  cancer,  she  would  not  have  died 
when  she  did,  but  at  some  later  time.  This  view  rather  tends  to 
make  the  operation  the  cause  of  death,  at  least  at  the  particular 
time  and  place  at  which  it  occurred.  Again,  suppose  she  had  been 
operated  on,  and  had  not  developed  the  postoperative  pneumonia. 
Then  she  might  have  been  permanently  cured  of  the  cancer  (some 
are)  and  lived  to  a ripe  old  age.  This  view  of  the  case  truly  makes 
the  pneumonia  the  cause  of  death.  Which  of  the  three  things — 
cancer,  operation,  or  pneumonia — is  to  be  charged  as  the  primary 
cause  of  death  plainly  depends  upon  the  point  of  view,  or,  put  in 
another  way,  upon  what  definitions  or  rules  are  set  up  as  to  what 
shall  be  called  the  cause  of  death. 

As  has  already  been  shown,  official  vital  statistics  operate  under 
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such  a set  of  rules.  In  the  case  cited,  cancer  would  be  given  as  the 
primary  cause  of  death,  and  the  postoperative  pneumonia  as  the 
secondary  or  complicating  cause.  To  the  philosophic  mind  this  is 
probably  the  least  satisfactory  solution  of  the  three.  Why  it  is  the 
officially  chosen  one  is  because  of  an  often  overlooked,  and  in  some  of 
its  aspects  quite  vicious,  underlying  concept  in  official  vital  statistics. 
There  is  ever  present  in  vital  statistics , and  from  the  beginning  always 
has  been , an  attempt  to  make  the  incidence  of  mortality  a measure  or 
index  of  the  incidence  of  morbidity.  Mortality  is  not  and  never  can 
be  a good  index  of  morbidity,  generally  speaking.  What  actually 
is  done  is  to  weaken  and  impair  the  value  of  the  statistics  for  the 
study  of  mortality  in  the  hope  to  make  them  a little  better  indices  of 
morbidity.  This  tendency  is  apparent  in  the  illustration  given 
above.  It  is  thought  desirable  to  get  as  complete  records  as 
possible  of  the  prevalence  of  cancer  in  the  population,  as  a disease. 
Therefore,  the  rule  is  that,  in  general,  if  a person  dies  who  is  known 
to  have  had  cancer  prior  to  death,  the  death  is  to  be  charged  to 
cancer.  In  consequence,  it  results  that  no  one  can  get  from  the 
official  statistics  an  accurate  answer  to  the  question:  “How  many 
persons  per  1000  living  did  cancer  kill  in  1920?”  Instead,  what  he 
gets  is  information  as  to  how  many  persons  died  per  1000  living  in 
1920,  who  had  had  cancer  before  they  died,  assuming  that  the 
diagnosis  is  correctly  made  in  every  case.  The  latter  information, 
as  anyone  with  a logical  mind  will  at  once  perceive,  is  quite  different 
from  the  former. 

Now  if  all  secondary  and  complicating  conditions  were  accu- 
rately reported  and  compiled,  the  case  would  be  far  better  in  respect 
of  the  objection  just  discussed.  But  this  is  an  unattainable  counsel 
of  perfection.  Even  if  it  were  accomplished  there  would  still 
remain  a large  source  of  error  in  statistics  of  the  causes  of  death. 
This  arises  from  the  fact  that  all  physicians  are  not  equally  intelli- 
gent or  clever  diagnosticians.  Clinical  diagnosis  is  not  yet  an 
exact  science.  A person  dies:  the  attending  physician  quite 
honestly  thinks  he  knows  what  this  patient  died  of,  and  registers 
his  conviction  on  the  death  certificate.  Actually,  the  physician 
may  have  been  mistaken  in  his  diagnosis,  too  often  grossly  so. 
But  his  error  gets  embalmed  in  the  official  vital  statistics. 
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This  phase  of  the  problem  has  been  the  subject  of  careful 
study  by  a committee  of  the  American  Public  Health  Association.5 
Every  student  of  vital  statistics  should  study  and  ponder  over  this 
committee’s  report.  He  will  be  bound  to  reach  the  conclusion  that 
there  are  but  few  indeed  of  the  rubrics  of  the  International  List 
whose  figures  can  be  unreservedly  accepted  at  their  face  value. 

The  following  classes  of  official  vital  statistics  alone  can,  in  the 
writer’s  opinion,  be  subjected  to  analysis  as  scientifically  accurate 
records  of  natural  phenomena: 

1.  Deaths  from  all  causes  (either  for  all  ages  together  or  for 

separate  age  groups,  as,  for  example,  “infant  mortality” 
(deaths  under  one  year  of  age). 

2.  Traumatism  (Rubrics  178,  180  to  188  inclusive,  192  to  194 
inclusive,  and  196  to  198  inclusive). 

3.  Homicide  (Rubrics  172  to  175  inclusive). 

This  is  neither  a long  nor,  except  in  its  first  item,  a specially 
important  list.  But  when  we  deal  with  other  rubrics  we  are  dealing 
with  mixtures  of  unknown  composition,  and  with  data  of  a wholly 
different  order  of  accuracy  than  those,  for  example,  of  the  physicist 
or  the  chemist.  We  are  forced,  of  course,  in  the  practical  conduct 
of  a statistical  business  to  deal  with  other  rubrics,  but,  at  any  rate, 
one  should,  when  so  doing,  always  remember  that  the  material  is 
fundamentally  of  a dubious  character.  In  the  discussion  of  this 
point  in  an  earlier  edition  of  this  book  suicide  was  included  as  a 
fourth  rubric  in  the  short  list  above.  It  cannot  withstand  criticism, 
however,  as  undoubtedly  the  fact  of  suicide  is  sometimes  concealed 
by  surviving  members  of  the  family;  how  often  no  one  knows. 
A similar  objection  may  be  made  about  homicide.  But  probably 
in  case  of  both  suicide  and  homicide  the  error  in  the  returns  so 
produced  is  statistically  negligible. 

Professor  Haven  Emerson,  of  Columbia  University,  who  is  one 
of  the  foremost  authorities  in  the  world  on  the  accuracy  of  certified 
causes  of  death,  suggests  {in  litt.)  that 

“certain  rubrics  of  the  International  List  may  be  considered  reasonably  accurate  if 
they  deal  with  neoplasms,  lesions,  conditions  verifiable  by  direct  observation  of  the 
surface  of  the  body  or  its  interior  where  accessible  by  inspection  through  body  orifices 
or  where  operative  procedures  have  made  the  interior  tissues  visible  by  direct  inspec- 
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tion,  or  where  there  is  a specific  test  or  organism  determining  the  cause  of  death,  as 
the  diphtheria  bacillus,  the  Plasmodium  malariae,  lead  or  alcohol  poisoning.” 


This  is  undoubtedly  true  where  the  interest  and  painstaking  care 
of  the  physician  in  filling  out  death  certificates  can  be  counted  on. 
But  the  testimony  of  registrars  is  that  these  qualities  are  by  no 
means  universal.  Dr.  Emerson’s  suggestion  defines  a set  of  condi- 
tions which  permit  the  reasonably  accurate  recording  of  the  cause 
of  death,  but  they  do  not  compel  it,  and  these  would  seem  to  be  two 
different  things. 

There  are  still  other  factors  in  the  case,  as  Dr.  Emerson  goes  on 
to  point  out : 

“Where,  as  in  France,  the  report  is  made  by  the  head  of  the  household  for  pur- 
poses of  the  etat  civil , there  is  a wider  error  than  where,  as  in  Switzerland,  there  is  a 
separation  of  the  factor  identity  of  a death  from  the  pathological  report  by  a physician 
of  the  cause  of  death. 

“As  I recall  it  the  late  Dr.  Ney  of  Switzerland  found  that  the  installation  of  the 
present  system  where  the  confidential  character  of  the  certificate  of  cause  of  death 
was  scrupulously  maintained,  and  thus  the  physician  was  protected  against  civil 
damage  suits  by  the  family  of  the  deceased,  resulted  in  an  increase  in  some  Cantons 
of  from  50  to  70  per  cent,  in  the  certification  of  syphilis  as  a cause  of  death. 

“I  believe  the  Swiss  system  has  been  for  the  past  ten  to  fifteen  years  superior  in 
principle  and  practice  to  that  of  any  other  country.” 


At  this  point  the  discussion  of  the  interesting  problem  of  the  accu- 
racy of  the  statistical  recording  of  causes  of  death  must  be  dropped, 
because  of  considerations  of  space.  It  should  be  emphasized,  in 
conclusion,  that  there  is  no  subject  of  greater  importance  in  the 
whole  field  of  vital  statistics,  and  there  is  no  subject  on  which 
further  research  is  more  needed  and  will  be  more  permanently 
profitable  and  useful. 
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CHAPTER  IV 


TABULAR  PRESENTATION  OF  STATISTICAL  DATA 

The  raw  material  of  statistics  consists  of  individual  observations 
of  phenomena.  The  simplest  way  to  tabulate  such  material  is,  of 
course,  to  make  a list  of  the  observations,  in  which  each  single  one 
constitutes  an  item  of  the  table.  But  this  can  scarcely  be  called 
tabulation,  because  it  does  not  perform  the  essential  function  of  that 
operation. 

The  purpose  of  tabulation  is  so  to  arrange  observations  that  like 
cases  shall  be  put  together  and  their  frequency  of  occurrence  in  the 
whole  group  thus  be  made  apparent. 

The  degree  of  likeness  of  the  cases  to  be  put  together  may  be 
defined  quantitatively  in  any  way  one  likes.  For  example,  it  may 
be  decided  for  purposes  of  tabulation  to  call  all  men  whose 
stature  falls  anywhere  between  65.00  and  65.99  inches,  alike  in 
stature,  and  put  them  in  the  same  class.  Evidently,  then,  the  first 
necessary  step  in  tabulating  observations  after  they  have  been 
collected  is  to  classify  them,  quantitatively  if  possible. 


DICHOTOMOUS  CLASSIFICATION 


Logically  considered,  classification  is  the  process  of  partitioning 
a universe  into  mutually  exclusive  categories  or  compartments.  The 
number  of  such  compartments  may  be  anything  from  two  up.  If 
it  is  exactly  two,  the  classification  is  called  dichotomous.  This  is 
the  alternative  category  type  of  classification.  At  the  moment  of 
this  writing: 


Every  living  person  in  the  world 


/ Either  has  smallpox 
\ Or  does  not  have  smallpox 


So  then  it  is  possible  to  put  every  person  into  his  proper  com- 
partment relative  to  this  classification. 
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But  this  process  can  be  continued  indefinitely: 


Every  living  person 
either 


Has  smallpox  and 


or 


Does  not  have  J 
smallpox  and  ] 


Has  a fever  and 
or 

[ Has  no  fever  and 

f Has  a fever  and 
or 

[ Has  no  fever  and 


[ Has  an  automobile,  n\ 
or 

Has  no  automobile,  ni 

Has  an  automobile,  nz 
or 

Has  no  automobile,  W4 

Has  an  automobile,  nz 
or 

Has  no  automobile, 

Has  an  automobile,  W7 
or 

Has  no  automobile,  n$ 


If  at  the  end  of  such  a process  of  dichotomizing  the  number  of 
cases  in  each  of  the  final  classes  be  counted,  we  shall  have  the 
frequency  of  occurrence  of  individuals  alike  in  the  respects  indicated 
by  the  line  of  the  classification  back  to  the  start.  Thus  in  the  ex- 
ample given  above  we  may  contrast  the  n\  persons  in  the  condition 
of  having  smallpox,  and  fever,  and  an  automobile,  with  the  ns  in- 
dividuals who  have  wholly  escaped  this  concatenation  of  disasters. 

An  example  of  a table  of  this  sort  is  presented  as  Table  2.  It  is 
based  upon  data  collected  to  determine  the  incidence  of  influenza 
among  tuberculous  and  non-tuberculous  persons  in  the  same  family 
during  the  influenza  pandemic  of  1918  (cf.  Pearl3). 


TABLE  2 

Showing  the  Incidence  of  Influenza  Among  Tuberculous  and  Non-tuber- 
culous White  Individuals,  Arranged  by  Presence  or  Absence  of  Other 
Cases  of  Influenza 


Tuberculous,  2375. 


Not  tuberculous,  8820. 


Influenza,  595. 

No  influenza,  1780. 

Influenza,  1971. 

No  influenza,  6849. 

Other 
cases  in 
household, 
460 

No  other 
cases  in 
household, 
135 

Other 
cases  in 
household, 
533 

No  other 
cases  in 
household, 
1247 

Other 
cases  in 
household, 
1788 

No  other 
cases  in 
household, 
183 

Other 
cases  in 
household, 
2568 

No  other 
cases  in 
household, 
4281 

From  Table  2 we  note  that  of  the  2375  tuberculous  persons, 
595,  or  25  per  cent.,  had  influenza,  while  1780,  or  75  per  cent.,  did 
not  have  this  disease  during  the  epidemic.  Of  the  8820  non- 
tuberculous  individuals  living  in  the  same  households  as  the  tuber- 
culous, 1971,  or  22.3  per  cent.,  had  influenza,  and  6849,  or  77.7  per 
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cent.,  did  not  have  it.  It  therefore  appears  that,  under  the  same 
environmental  conditions  of  living,  only  2.7  per  cent,  more  of  the 
tuberculous  individuals  than  of  the  non-tuberculous  contracted 
influenza  during  the  epidemic. 

Of  the  595  tuberculous  persons  who  had  influenza,  460,  or  77.3 
per  cent.,  were  in  households  where  at  least  one  other  person  also 
had  influenza  during  the  epidemic.  Of  the  1971  non-tuberculous 
persons  who  had  influenza,  on  the  other  hand,  1788,  or  90.7  per 
cent.,  were  in  households  where  at  least  one  other  person  also  had 
influenza.  Or,  in  other  words,  22.7  per  cent,  of  the  tuberculous 
who  had  influenza  were  the  only  cases  of  the  latter  disease  in  their 
households,  while  only  9.3  per  cent,  of  the  non-tuberculous  who  had 
influenza  were  the  sole  cases  in  the  household. 

Of  1780  tuberculous  persons  who  did  not  have  influenza  during 
the  epidemic,  only  533,  or  29.9  per  cent.,  were  exposed  to  influenza 
infection  in  the  household,  whereas  of  the  6849  non-tuberculous 
persons  who  did  not  have  influenza,  2568,  or  37.5  per  cent.,  were 
exposed  to  infection  within  the  household. 

These  examples  will  suffice  to  show  how  a simple  dichotomous 
statistical  table  is  to  be  read. 

Now  instead  of  dividing  the  residual  universe  into  just  two 
parts  each  time  we  may  equally  well  divide  it  into  a number  of 
parts.  This  leads  to  some  sort  of  linear  classification. 

Such  a linear  classification  and  tabulation  based  thereon  may  be 
combined  terminally  with  a preceding  dichotomous  table,  and  this 
often  furnishes  a useful  form  of  statistical  tabulation.  An  example 
is  given  in  Table  3,  which  is  an  expansion  of  Table  2.  The  linear 
classification  in  this  case  is  relative  to  the  number  of  persons  in 
the  household,  proceeding  from  1 to  15,  down  the  first  or  left- 
hand  column  of  the  table. 

It  will  be  noted  at  once  that  this  expansion  by  size  of  household 
throws  interesting  and  significant  light  upon  the  results  stated 
above  from  the  more  meager  distributions  of  Table  2.  The  manner 
in  which  this  is  accomplished  may  be  left  to  the  reader  to  work 
out  for  himself  as  a useful  exercise  in  getting  familiar  with  the 
reading  of  statistics. 

The  linear  classification  of  Table  3,  by  number  of  persons  in 
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TABLE  3 


Showing  the  Incidence  oe  Influenza  Among  Tuberculous  and  Non-tuber- 
culous  White  Individuals,  Arranged  (A)  by  Number  of  Persons  in  House- 
hold, and  (B)  by  Presence  or  Absence  of  Other  Cases  of  Influenza 


Tuberculous. 

Not  tuberculous. 

Number 

in 

Influenza. 

No  influenza. 

Influenza. 

No  influenza. 

house- 

hold. 

Other 
cases  in 
house- 
hold. 

No  other 
cases  in 
house- 
hold. 

Other 
cases  in 
house- 
hold. 

No  other 
cases  in 
house- 
hold. 

Other 
cases  in 
house- 
hold. 

No  other 
cases  in 
house- 
hold. 

Other 
cases  in 
house- 
hold. 

No  other 
cases  in 
house- 
hold. 

1 

14 

2 

4 

io 

12 

108 

4 

15 

7 

100 

3 

46 

39 

38 

161 

76 

22 

118 

292 

4 

72 

28 

81 

255 

168 

37 

243 

696 

5.  ....  . 

89 

27 

78 

221 

262 

21 

363 

749 

6 

73 

16 

96 

210 

303 

29 

419 

822 

7 

71 

9 

83 

123 

358 

18 

480 

636 

8 

51 

2 

68 

82 

257 

24 

414 

446 

9 

22 

3 

40 

33 

117 

8 

188 

246 

10 

18 

1 

16 

20 

114 

3 

138 

170 

11.  

8 

12 

12 

49 

5 

91 

43 

12 

3 

5 

5 

36 

1 

63 

43 

13 

2 

3 

2 

32 

28 

24 

14 

15 

i 

i 

i 

12 

16 

14 

Totals. 

460 

135 

533 

1247 

1788 

183 

2568 

4281 

595 

1780 

1971 

6849 

2375 

8820 

the  household,  presents  no  problem  for  decision  as  to  where  each 
case  belongs  in  making  the  table.  There  are  no  fractional  compo- 
nents of  a household.  Each  will  have  2,  or  4,  or  some  other  definite 
and  simple  whole  number  of  individuals  in  it.  When,  however,  we 
deal  with  things  which  are  measured , instead  of  counted , a new 
element  enters  the  tabulating  situation.  This  may  be  illustrated 
by  Table  4,  which  is  a simple  statistical  table  based  upon  a linear 
classification. 

In  Table  4 the  observed  systolic  blood-pressures  are  divided 
into  seven  mutually  exclusive  classes.  Each  class'  includes  an 
elemental  range  of-  20  mm.  pressure.  This  classification  says  that 
systolic  pressures  of  between  say  130  and  150  mm.  are  to  be  regarded 
for  practical  purposes  as  alike.  The  correct  way  to  state  class 
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TABLE  4 

Frequency  Distribution  of  Systolic  Blood-pressures  in  102  Men  Aged  Seventy- 
five  and  Over.  (From  Thompson  and  Todd,  Lancet,  1922,  II,  503.) 


Systolic  pressure  Absolute 

(mm.  Hg).  frequency. 

110-129 18 

130-149 31 

150-169 23 

170-189 20 

190-209 7 

210-229 1 

230-249 2 


Total 102 


limits  in  setting  up  a frequency  table  is  that  followed  in  Table  4. 
The  class  range  110-129  means  theoretically  that  all  pressures  are 
included  which  are  equal  to  or  greater  than  110.0000  . . . and  are 
equal  to  or  less  than  129.9999.  . . . 

In  grouping  observations  in  this  way  we  are  doing  essentially 
the  same  thing  that  is  done  in  measuring  when  the  graduations  on 
the  measuring  scale  have  a defined  degree  of  coarseness.  Suppose, 
for  example,  each  one  of  a group  of  men  to  be  measured  as  to  height 
with  a stick  graduated  only  to  inches.  Some  few  men  in  the  group 
will  be  of  a height  which  exactly  coincides  with  one  of  the  inch 
markings  on  the  stick  and  their  height  will  be  that  exact  number  of 
inches.  More  of  them,  however,  will  have  a height  falling  some- 
where in  between  two  consecutive  inch  marks  on  the  stick.  Say 
there  are  four  men  whose  height  falls  between  the  72-inch  and  73- 
inch  mark.  These  four  men  differ  from  each  other  in  height  by 
less  than  an  inch.  If  we  have  agreed  at  the  start  to  measure  only 
to  the  fineness  of  1 inch,  this  is  equivalent  to  saying  that  we  pro- 
pose to  regard  individuals  differing  from  each  other  by  less  than 
an  inch,  as  being  of  the  same  height. 

Scales  can  be  read  in  two  different  ways.  Thus  an  individual 
whose  actual  height  is  72.2  inches  may  be  said  to  be,  on  the  basis 
of  measuring  with  a scale  divided  into  inches  only : 

either  a , more  than  72  inches  in  height  but  less  than  73  inches; 
or  b,  nearer  72  inches  than  73  inches  in  height. 

In  practical  statistical  work  it  makes  some  difference  which  of 
these  methods  of  recording  scale  readings  is  adopted.  A group  of 
individuals  in  our  statistics  recorded  as  72  inches  in  height  according 
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to  the  first  method  (a)  of  reading  and  recording  will  include  indi- 
viduals ranging  in  height  between  72.0000  ...  01  inches  and 
72.9999  ...  9 inches.  On  this  method  of  reading  the  central  point 
of  the  group  will  be,  to  a practical  degree  of  approximation,  72.5 
inches.  But  a group  of  individuals  recorded  as  72  inches  in  height, 
on  the  second  (b)  method  of  reading  and  recording,  will  include  in- 
dividuals ranging  in  height  from  71.50000  ...  01  inches  and  72.4999 
...  9 inches.  The  central  point  of  the  group,  read  and  recorded  in 
this  way,  will  be,  again  to  a practical  degree  of  approximation,  72 
inches. 

In  vital  statistics  this  point  is  perhaps  of  greatest  practical 
importance  relative  to  the  recording  of  age.  Here  method  (a) 
records  “age  at  last  birthday,”  while  method  (b)  records  “age  at 
nearest  birthday.”  After  a somewhat  vacillating  policy  in  the 
past,  official  vital  statisticians  (for  example,  the  Census  Bureau) 
have  now  adopted  method  (a)  as  a definite  policy  in  their  work. 
This  obviously  makes  for  greater  accuracy.  If  a person  at  a given 
moment  is  near  the  half  way  point  between  two  consecutive  birth- 
days it  is  not  easy,  without  careful  figuring,  to  say  which  of  the 
two  is  nearer.  But  if  he  knows  his  age  at  all  he  can  instantly  say 
what  it  was  at  his  last  birthday.  When  really  accurate  work  with 
statistical  data  is  attempted  it  is  always  necessary  to  be  sure  whether 
method  (a)  or  method  ( b ) was  used  in  the  original  records. 

DOUBLE  DICHOTOMOUS  TABLES 

The  principle  of  dichotomous  classification,  with  expansion  of 
terminal  classes  linearly,  may  be  applied  to  both  sides  of  a table. 
There  will  then  result  what  may  be  called  a double  dichotomous 
table , which  is  one  of  the  most  useful  forms  of  tabulation  for  raw, 
basic  statistical  data.  Why  it  is  so  is  because  it  permits  the  great- 
est freedom  and  variety  in  the  subsequent  constructive  and  deriva- 
tive use  of  the  material. 

Table  5 is  a simple  example  of  a double  dichotomous  table. 
This  table  presents  certain  information  derived  from  the  autopsy 
protocols  of  358  persons  found  at  autopsy  in  the  Johns  Hopkins 
Hospital  to  have  miliary  tuberculosis  of  some  organ  or  organs  of 
the  body.  There  are  8 X 12  =96  elemental  cells  in  this  table. 


Original  Data  on  Color,  Sex,  Age,  and  Location  of  Lesions  of  358  Persons  Found  at  Autopsy  to  Have  Miliary  Tuberculosis 
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in 

kidneys 
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in 

kidneys 
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Each  cell  tells  the  number  (i.  e.,  the  frequency)  of  individuals  in  the 
total  universe  of  358  who  were  alike  in  the  following  respects: 

1.  Color. 

2.  Sex. 

3.  Age  (in  broad  classes). 

4.  Presence  (or  absence)  of  tuberculous  lesions  in  lungs. 

5.  Presence  (or  absence)  of  tuberculous  lesions  in  heart. 

6.  Presence  (or  absence)  of  tuberculous  lesions  in  kidneys. 

Furthermore,  the  frequency  of  every  possible  combination  of 

these  categories  is  stated  in  Table  5. 

This  table  will  repay  careful  and  detailed  study  from  the  stand- 
point of  statistical  methodology.  First,  let  us  see  by  some  examples 
how  it  is  to  be  read. 

(. Single  cell  reading).  There  was  1 colored  male  with  miliary 
tuberculosis,  falling  in  the  age  class  twenty  to  forty-nine  years,  who 
had  no  tuberculous  lesions  in  either  kidneys  or  lungs,  but  did  have 
a tuberculous  lesion  of  the  heart. 

(. Primary  subtotal  reading).  There  were  15  white  males  aged 
fifty  or  over  among  the  358  persons  who  had  miliary  tuberculosis. 

(. Secondary  subtotal  reading).  There  were  but  4 persons,  in  the 
358  who  had  miliary  tuberculosis,  who  had  a tuberculous  lesion 
of  the  heart,  but  at  the  same  time  lacked  any  such  lesion  of  the 
lungs. 

( Tertiary  subtotal  reading).  There  were  123  white  and  235 
colored  persons  in  this  experience  of  miliary  tuberculosis. 

It  is  obvious  that  this  form  of  table  may  be  expanded  to  any 
desired  degree.  The  double  dichotomous  type  of  table  leads  up  to 
and  exemplifies  the  theoretical  ideals  of  statistical  tabulation. 
These  ideals  always  to  be  kept  in  mind  in  tabulating  raw  statistical 
data  as  a matter  for  reference  and  possible  future  synthetic  or 
derivative  use  are : 

1.  Make  the  information  in  each  cell  exclusive  relative  to  as 
many  different  categories  as  is  possible,  while  still  conforming  to  the 
ideal  of 

2.  Making  a tabulation,  not  a mere  list. 

The  first  of  these  ideals  perhaps  needs  a little  further  illustration 
to  make  its  meaning  entirely  clear.  The  records  of  the  Baltimore 
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Health  Department  for  1917  show  that  in  that  year  there  died 
223  bookkeepers  and  clerks  and  124  drivers  and  hostlers. 

The  same  records  also  show  that  in  the  same  year  there  died 
1213  persons  of  tuberculosis  of  the  lungs. 

But  it  is  impossible  to  determine  from  the  records  how  many 
of  the  bookkeepers  or  of  the  hostlers  died  of  tuberculosis  of  the 
lungs.  Some  part  surely  of  the  223  bookkeepers  and  the  124 
drivers  and  hostlers  had  tuberculosis.  Why  it  is  impossible  from 
the  published  tabulations  to  find  out  how  many  were  in  this  part, 
is  that  the  elemental  cells  of  each  of  the  published  tables  are  too 
inclusive.  Two  hundred  and  twenty- three  and  124  are  elemental 
cell  frequencies  of  the  published  table  of  deaths  by  occupations, 
and  1213  is  an  elemental  cell  frequency  in  the  published  table  of 
deaths  by  causes.  But  the  223  persons  of  the  first  mentioned  cell 
are  alike  in  only  one  respect , namely,  that  they  were  all  either  clerks 
or  bookkeepers.  They  included  males  and  females,  whites  and 
colored,  persons  dying  of  tuberculosis,  cancer,  etc.  In  short,  the 
information  is  exclusive  relative  only  to  one  single  category.  This 
may  be  satisfactory  or  desirable  in  derivative  tables  of  constants 
and  the  like,  but  it  is  eminently  unsatisfactory  in  original  tables 
of  the  raw  statistical  material. 

CORRELATION  TABLES 

A table  of  double  entry  in  which  the  condition  or  status  of  each 
individual  is  entered  relative  to  two  characteristics,  or  attributes 
simultaneously  is  called  a correlation  table.  This  type  of  table  is  one 
of  the  most  important  in  statistical  work.  An  example  of  a correla- 
tion table  is  shown  in  Table  6.  This  table  correlates  the  relative 
cell  volume  of  the  blood  (volume  of  corpuscles  as  percentage  of 
total  volume)  with  body-weight,  in  449  males  having  active  tuber- 
culosis.* 

As  an  illustration  of  the  manner  in  which  correlation  tables  are 
to  be  read,  it  is  seen  from  Table  6 that,  in  this  experience,  there  were 
20  males  whose  body  weight  fell  somewhere  within  the  scale  range 

* Pearl,  R.,  and  Miner,  J.  R.:  A Biometric  Study  of  the  Relative  Cell  Volume  of 
Human  Blood  in  Normal  and  Tuberculous  Males,  Bull.  Johns  Hopkins  Hosp.,  vol. 
40,  pp.  3-32,  1927.  Table  on  p.  26. 
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TABLE  6 

Correlation  of  Relative  Cell  Volume  with  Body- weight  Among  Actively 

Tuberculous  Males 


RELATIVE  CELL  VOLUME 


BODY-WEIGHT 

25.5-27.4 
per  cent 

27.5-29.4 
per  cent 

29.5-31.4 
per  cent 

31.5-33.4 
pei  cent 

33.5-35.4 

per  cent 

35.5-37.4 

per  cent 

' 37.5-39.4 

per  cent 

39.5-41.4 

per  cent 

41.5-43.4 

per  cent 

43.5-45.4 

per  cent 

45.5-47.4 

per  cent 

47.5-49.4 

per  cent 

49  5-51  4 

per  cent 

51.5-53.4 

per  cent 

Totals 

pounds 

89.5-  99.4 

1 

1 

3 

1 

6 

99.5-109.4 

1 

1 

1 

1 

— 

3 

4 

2 

4 

3 

— 

— 

— 

1 

21 

109.5-119.4 

— 

— 

— 

3 

— 

5 

4 

7 

8 

4 

3 

3 

1 

2 

40 

119.5-129.4 

2 

— 

— 

1 

2 

6 

8 

10 

10 

16 

15 

7 

5 

2 

84 

129.5-139.4 

— 

— 

2 

3 

2 

7 

6 

9 

20 

19 

15 

14 

6 

3 

106 

139.5-149.4 

. — 

. — 

— 

— 

1 

— 

3 

13 

17 

22 

22 

10 

6 

1 

95 

149.5-159.4 

— 

— 

2 

1 

— 

2 

3 

4 

9 

17 

3 

4 

1 

46 

159.5-169.4 

— 

— 

— 

1 

1 

1 

— 

2 

8 

3 

5 

4 

1 

— 

26 

169.5-179.4 

— 

— 

— 

— 

— 

1 

— 

1 

— 

— 

4 

1 

4 

— 

11 

179.5-189.4 

— 

— 

— 

— 

— 

1 

— 

— 

2 

2 

3 

1 

1 

1 

11 

189.5-199.4 

1 

1 

2 

199.5-209.4 

209.5-219.4 

1 

1 

Totals 

3 

2 

3 

11 

8 

24 

30 

48 

74 

78 

84 

43 

29 

12 

449 

extending  from  129.5  pounds  to  just  under  139.5  pounds  (denoted 
in  the  left  marginal  rubrics  as  129.5-139.4);  and  these  same  20 
males  exhibited  relative  cell  volumes  of  their  blood  falling  within 
the  scale  range  extending  from  41.5  to  just  under  43.5  per  cent, 
(denoted  in  the  marginal  rubrics  along  the  top  as  41.5-43.4  per 
cent.). 

ARRANGEMENT  OF  STATISTICAL  TABLES 

Much  of  the  cogency  and  force  of  statistical  tables,  otherwise 
correct,  depends  upon  their  arrangement.  This  is  a subject  about 
which  it  is  difficult,  if  not  wholly  impossible,  to  state  general 
principles,  yet  in  no  other  respect  is  it  easier  to  distinguish  the 
performance  of  the  experienced  professional  statistician  from  that 
of  the  amateur.  One  may  say:  “Make  a clear,  concise,  easily 
read  table,  which  bears  directly  upon  the  subject  under  discussion, 
and  upon  no  other  subject,”  but  obviously  this  counsel  is  rich  in 
why-ness  and  poor  in  how-ness.  Perhaps  an  illustration  may  be 
helpful. 
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In  the  excellent  paper  by  Dr.  Huntington  Williams  on  “Epidemic 
Jaundice  in  New  York  State,  1921-1922/’*  the  table  here  repro- 
duced as  Table  7 appears. 

Now  let  us  examine  the  first  purpose  of  this  table.  It  is  stated 
in  the  original  that:  “Each  of  eighteen  common  symptoms  is 
recorded  in  Table  1 (Table  7 here)  for  every  case  in  the  series  of 
700  that  were  studied.  Symptoms  are  reported  [on  the  physician’s 
original  case  reports  presumably]  positive,  negative,  or  not  re- 

TABLE  7 

Original  Form  of  Table  on  Symptomatology  of  Epidemic  Jaundice 


Symptom. 

Cases  positive. 

Cases  negative. 

Not  recorded. 

Num- 

ber. 

Per 

cent. 

Num- 

ber. 

Per 

cent. 

Num- 

ber. 

Per 

cent. 

Jaundice 

647 

92.4 

11 

1.6 

42 

6.0 

Anorexia 

574 

82.0 

68 

9.7 

58 

8.3 

Nausea 

619 

88.4 

46 

6.6 

35 

5.0 

Vomiting 

503 

71.9 

169 

24.1 

28 

4.0 

Headache 

488 

69.7 

139 

19.9 

73 

10.4 

Constipation 

463 

66.1 

110 

15.7 

127 

18.2 

Prostration 

211 

30.1 

81 

11.6 

408 

58.3 

Clay-colored  stools 

558 

79.7 

46 

6.6 

96 

13.7 

Bile-stained  urine 

617 

88.2 

10 

1.4 

73 

10.4 

Abdominal  pain 

417 

59.6 

211 

30.1 

72 

10.3 

Fever 

524 

74.9 

105 

15.0 

71 

10.1 

Chills 

334 

47.7 

293 

41.9 

73 

10.4 

Limb  pains 

235 

33.6 

297 

42.4 

168 

24.0 

Diarrhea 

106 

15.2 

442 

63.1 

152 

21.7 

Conjunctival  congestion.  . . 

66 

9.4 

103 

14.7 

531 

75.9 

Epistaxis 

61 

8.7 

525 

75.0 

114 

16.3 

Herpes 

28 

4.0 

536 

76.6 

136 

19.4 

Hiccup 

98 

14.0 

478 

68.3 

124 

17.7 

Unusual  prevalence  of  rats 
on  premises 

167 

23.9 

262 

37.4 

271 

38.7 

corded.”  Now,  plainly,  the  purpose  of  the  tabulation  is  to  show  the 
relative  and  absolute  frequency  of  each  of  the  symptoms  taken  by 
itself.  But,  plainly,  “not  recorded”  furnishes  no  information  about 
symptoms.  It  only  tells  the  reader  that  no  record  was  made  of 
symptoms.  Hence  its  inclusion  in  a table  which  only  purports  to 
tell  us  about  symptoms  is  superfluous  and  wholly  beside  the  point. 
But  since  the  “not  recorded”  cases  are  included  in  the  percentages 
* Jour.  Amer.  Med.  Assoc.,  vol.  80,  pp.  532-534,  1923. 
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(which  add  to  100  across  the  table,  and  therefore  include  the  whole 
of  each  universe),  the  percentages  defeat  the  main  purpose  of  the 
table,  which  is  to  inform  us  as  to  which  symptoms  are  relatively 
most  frequent.  Furthermore,  even  if  this  difficulty  were  corrected, 
we  should  still  have  to  search  laboriously  down  the  list  to  find 
which  was  the  most  frequent  symptom,  the  next  most  frequent, 
and  so  on,  owing  to  the  fact  that  no  attention  is  paid  to  the  order 
of  arrangement  of  the  symptoms. 

TABLE  8 

Showing  the  Absolute  and  Relative  Frequency  of  Occurrence  of  Different 
Symptoms  in  So  Many  of  700  Cases  of  Epidemic  Jaundice  as  Furnished 
Definite  Records  of  Presence  or  Absence  of  Each  of  the  Indicated 
Symptoms 


Order. 

Symptom. 

Symptom  present. 

Symptom  absent. 

Total 
cases  with 
any  record 
about  this 
symptom. 

No. 

Per  cent. 

No. 

Per  cent. 

1 

Jaundice 

647 

98 

11 

2 

658 

2 

Bile-stained  urine 

617 

98 

10 

2 

627 

3 

Nausea 

619 

93 

46 

7 

665 

4 

Clay-colored  stools 

558 

92 

46 

8 

604 

5 

Anorexia 

574 

89 

68 

11 

642 

6 

Fever 

524 

83 

105 

17 

629 

7 

Constipation 

463 

81 

110 

19 

573 

8 

Headache 

488 

78 

139 

22 

627 

9 

Vomiting 

503 

75 

169 

25 

672 

10 

Prostration 

211 

72 

81 

28 

292 

11 

Abdominal  pain 

417 

66 

211 

34 

628 

12 

Chills 

334 

53 

293 

47 

627 

13 

Limb  pains 

235 

44 

297 

56 

532 

14 

Conjunctival  congestion.  . . . 

66 

39 

103 

61 

169 

15 

Unusual  prevalence  of  rats 
on  premises 

167 

39 

262 

61 

429 

16 

Diarrhea 

106 

19 

442 

81 

548 

17 

Hiccup 

98 

17 

478 

83 

576 

18 

Epistaxis 

61 

10 

525 

90 

586 

19 

Herpes 

28 

5 

536 

95 

564 

Let  us  then  examine  the  table  (now  Table  8)  in  rearranged 
form,  to  fulfil  in  maximum  degree  possible  from  the  published  data 
the  fundamental  purpose  for  which  it  was  tabulated. 

Table  8 tells  the  story  of  symptomatology  much  more  simply, 
directly,  and  accurately  than  does  Table  7,  of  which  it  is  merely 
a rearrangement.  It  is  seen  at  a glance,  for  example,  that  more 
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than  90  per  cent,  of  the  cases  about  which  anything  definite  as  to 
the  symptoms  was  known,  exhibited  at  least  one  of  the  four  following 
symptoms:  jaundice,  bile-stained  urine,  nausea,  clay-colored  stools. 
Fewer  than  20  per  cent,  of  the  cases  had  either  diarrhea  or  hiccup, 
or  epistaxis,  or  herpes,  each  taken  by  itself. 

In  making  this  rearrangement  three  changes  were  made  from 
the  original  table: 

(a)  The  percentages  were  calculated  on  the  basis  of  the  known 
universe  of  discourse.  To  do  otherwise  in  this  case  makes  the 
percentages  virtually  meaningless. 

(b)  Percentages  were  tabled  only  in  whole  numbers.  No 
derivative  calculations  will  be  made  from  these  percentages.  Their 
sole  purpose  is  quickly  and  simply  to  inform  the  reader  of  the 
relative  frequencies  of  certain  conditions.  Decimals  are  only  an 
annoyance  under  such  circumstances. 

(c)  The  symptoms  are  arranged  in  descending  order  of  relative 
frequency.  This  makes  rapid  and  intelligent  reading,  and  evalua- 
tion of  the  table  as  a whole,  easy  of  accomplishment.  What  could 
be  more  desirable  if  the  author  wishes  to  instruct  and  entertain 
his  reader? 

The  percentage  figures  of  Table  8 are  shown  graphically  in 
Fig.  35  of  Chapter  VI  on  p.  168. 

It  will  be  good  practice  for  the  reader,  in  developing  for  himself 
skill  in  the  planning  and  arrangement  of  tables,  mentally  to  criticize 
statistical  tables  as  he  encounters  them  in  his  general  medical  read- 
ing, and  try  whether  he  could  re-arrange  the  same  data  into  more 
accurate,  intelligible,  or  simple  form.  This  particular  process  will  be 
materially  aided,  to  say  nothing  of  the  general  training  in  accuracy 
and  precision  of  mental  processes  which  will  incidentally  accrue,  if 
one  approaches  a statistical  table  in  some  such  manner  as  this: 

What  is  the  purpose  of  this  table?  What  is  it  supposed  to 
accomplish  in  the  mind  of  the  reader? 

Does  it?  Well?  Indifferently?  Badly?  Not  at  all? 

Wherein  does  its  failure  of  attainment  fall? 

When  this  last  question  has  been  analyzed  and  settled,  the 
process  of  making  a satisfactory  table  to  accomplish  the  purpose  is 
much  more  than  half  finished. 
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CHAPTER  V 


ORIGINAL  SCIENTIFIC  RECORDS  AND  THEIR  TRANSLA- 
TION TO  TABULAR  FORM 

Up  to  this  point  in  the  discussion  original  statistical  data  have 
been  tacitly  assumed  to  be  given.  The  reader  has  not  been  required 
to  undertake  any  responsibility  regarding  their  collection.  We 
have,  to  be  sure,  examined  with  some  care  the  methods  by  which 
official  vital  statistics  are  obtained  (Chapter  III).  But  this  was  a 
special  study  of  how  an  official  government  body — the  Census 
Bureau— goes  about  furnishing  vital  statisticians  with  their  basal 
sustenance,  so  to  speak. 

It  is  only  a small  fraction  of  the  scientific  data  to  which  statistical 
methods  of  research  may  be  usefully  applied  that  governments  take 
the  trouble  to  furnish  to  students.  Mostly  the  student  of  any  sub- 
ject has  to  collect  his  own  data,  by  means  of  observation  and  experi- 
ment, from  the  phenomenal  world  around  him.  In  this  chapter 
the  attempt  will  be  made  to  discuss  briefly  some  general  principles 
underlying  the  collecting,  recording,  and  putting  into  tabular  form 
of  original  observational  data  in  the  manner  most  convenient  and 
useful  for  subsequent  statistical  treatment. 

THE  COLLECTION  OF  SCIENTIFIC  DATA 

All  scientific  data  are  answers  to  specific  questions  put  to 
Nature  by  the  investigator.  The  scope  of  both  the  question  and  the 
answer  is  necessarily  sharply  and  narrowly  delimited  in  each  par- 
ticular case.  For  example,  if  an  investigator  starts  collecting  data 
on  the  length  of  the  human  skull,  what  he  does  is  to  measure,  with 
appropriate  instruments  and  with  the  greatest  attainable  accu- 
racy, the  length  of  each  one  of  a series  of  skulls.  When  he  sets  down 
in  his  record  book  that  the  length  of  skull  No.  1 was  137  mm.,  the 
record  137  mm.  is  the  answer  to  the  implied  question  uwhat  is  the 
length  of  skull  No.  1?”  Wherever  possible  science  asks  this  simple 
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type  of  question,  and  records  the  answers  as  numerical  statements 
of  quantity.  This  is  not  always  possible,  however,  basically  for  the 
reason  that  there  are  a great  many  things  that  are  interesting  which 
no  one  has  yet  found  a way  to  measure,  or  express  quantitatively. 
With  the  progress  of  science  the  number  of  such  things  is,  happily, 
all  the  time  getting  smaller.  But  it  is  still  undeniably  large.  Where 
the  simple  straightforward  numerical  answer  is  impossible,  the 
record  has  to  be  of  a more  complex  character.  For  example,  when 
a physician  makes  a stethoscopic  examination  of  the  lungs  and 
writes  down  as  a part  of  his  record  “moist  rales  at  left  base,”  the 
answer  is  enormously  more  complicated  in  its  implications  than  is 
the  craniologist’s  precise  figure  of  skull  length. 

If  the  primary  business  of  science  is  to  ask  questions  and  set 
down  the  answers  to  them,  then  the  questionnaire  may  be  regarded 
as  the  canonical  form  of  scientific  record.  This  is  employing  the 
term  “questionnaire”  in  a broader  and  more  inclusive  sense  than 
is  usual.  It  is  so  used  here  to  emphasize  the  essential  nature  of 
original  scientific  records,  namely,  that  they  are  individual  answers 
to  specific  questions.  If  this  concept  is  once  clearly  and  firmly 
grasped  by  the  mind,  it  will  greatly  help  the  student  in  wrestling 
with  the  never-ending  problems  of  methodology  which  will  keep 
on  arising  so  long  as  he  attempts  to  do  any  original  work  in  any 
branch  of  science.  For  it  will  enable  him  to  see  that  there  are  really 
only  two  great  methodological  problems  in  scientific  research.  The 
first  is:  “What  will  be  the  most  effective  and  useful  way  to  ask 
the  question?”  The  second  is:  “How  may  it  best  be  assured  that 
the  answer  shall  be  correct,  precise,  clear,  and  without  ambiguity?” 
Obviously  these  two  questions  are  not  wholly  independent.  A sub- 
stantial part  of  the  second  is  implied  in  the  first.  In  the  opinion  of 
many  competent  investigators  it  is  the  first  methodological  problem 
that  is  the  most  important  and  the  most  difficult.  They  hold  that 
the  successful  and  fruitful  outcome  of  original  investigation  depends 
most  upon  the  Fragestellung — how  the  question  is  put  to  Nature. 

Anything  like  adequate  didactic  instruction  to  the  beginner  on 
this  important  matter,  if  not  wholly  impossible  in  the  nature  of  the 
case,  is  certainly  too  vast  an  undertaking  for  the  scope  of  this  book 
as  well  as  for  the  limited  competency  of  its  author.  The  best  prac- 
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tical  help  here  which  can  be  offered  the  student  is  to  suggest  that 
he  read  widely  and  deeply  in  the  history  of  science.  Reference 
No.  16  in  the  reading  list  at  the  end  of  the  chapter  was  especially 
prepared  to  give  the  student  a guided  start  in  this  direction.  The 
history  of  science  tells  us  at  least  how  the  great  investigators,  whose 
efforts  have  brought  about  the  achievement  of  so  much  knowledge 
of  Nature  and  her  laws  as  we  now  possess,  put  their  questions. 
And  example  is  not  a bad  pedagogical  technique. 

SOME  ESSENTIAL  IDEALS  IN  THE  MAKING  OF  SCIENTIFIC  RECORDS 

Let  us  turn  now  to  some  consideration  of  the  relatively  simpler 
problem  of  the  recording  of  the  answers  to  questions  scientifically 
asked.  This  may  be,  and  often  is,  regarded  as  merely  a sordid, 
mechanical  business,  not  worthy  really  serious  attention.  But  surely 
it  is  a pity  to  increase  the  labor  and  strain  of  scientific  work  by 
requiring  it  to  be  done  with  illegible,  incomprehensible,  or  ambig- 
uous original  records.  It  requires  a tremendous  amount  of  labor,  in 
the  aggregate,  to  collect  accurate,  scientific  data.  Surely  it  is  not 
much  to  ask  that  the  original  record  of  them  be  so  made  as  to  be 
permanently  clear,  precise,  and  useful  for  whatever  subsequent  pur- 
poses it  may  be  desired  to  put  it  to. 

The  following  list  of  desirable  characteristics  of  original  scien- 
tific records  makes  no  effort  toward  pedantic  completeness.  But 
perhaps  it  may  stimulate  the  student  himself  to  think  a little  when 
he  is  engaged  in  the  dull  spade  work  of  measuring,  counting,  taking 
case  histories,  or  making  physical  diagnoses. 

1.  Accuracy . — This  must,  of  course,  come  first  in  the  making  of 
scientific  records.  It  is  attained,  more  than  in  any  other  way, 
by  the  exercise  of  two  rather  rare  human  qualities,  at  least  in 
their  native,  uncultivated  state.  These  are  carefulness  and  atten- 
tiveness. Most  mistakes  in  the  recording  of  the  results  of  experi- 
ments or  measurements  are  due  to  careless  wandering  of  the  atten- 
tion, momentarily  or  longer,  from  the  business  immediately  in 
hand.  Unfortunately  there  is  no  general,  infallible,  mechanical 
apparatus  adequate  to  obviate  this  difficulty.  Perhaps  the  best 
suggestion  is  that  a habit  be  formed  to  check  each  individual 
record  directly  after  it  is  set  down  on  paper,  there  and  then,  against 
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the  observed  object,  while  the  latter  is  still  at  hand.  Of  course,  in 
the  nature  of  things,  this  cannot  always  be  done.  But,  anyhow,  it 
will  not  be  a bad  habit  to  try  to  form  in  this  imperfect  world. 

2.  Altruism—  Every  scientific  worker  should  always  keep  before 
his  mind  that  somebody  else  may  sometime  want  to  make  use  of  his 
original  records.  An  unexpected  disabling  illness,  or  death,  or 
even  the  getting  of  a better  job  may  lead  to  such  a situation. 
Whence  it  follows  that  every  page,  every  line,  and  indeed  every 
word  and  figure  of  the  record  should  be  absolutely  clear  as  to  its 
meaning,  in  a proximate  and  immediate  sense.  Everyone  is  prone 
to  abbreviate  and  condense  in  the  tedious  work  of  making  records. 
This  is  obviously  good  sense,  and  unobjectionable  in  principle. 
But  a meticulously  detailed  account  of  the  abbreviations,  the  man- 
ner of  condensation,  should  go  along  with  the  records.  Any  con- 
siderable experience  of  working  with  records  collected  by  other 
people  engenders  strong  views  on  this  point.  It  is  perfectly  natural 
for  anyone  to  feel  that  he  understands  his  system  of  abbreviating 
his  notes,  and  to  forget  that  what  is  so  clear  to  him  at  the  time  will 
not  be  clear  to  others,  or  incidentally  to  himself  after  a sufficient 
lapse  of  time,  unless  he  takes  the  trouble  to  set  down  the  explanation 
at  the  time  along  with  the  record. 

3.  Neatness  and  Legibility  .—A  scientific  record  which  no  one, 
even  its  maker,  can  certainly  read,  may  be  accurately  defined  as  a 
total  loss.  If  it  is  difficult  to  read  it  is  a nuisance.  Taking  pains  to 
make  figures  and  writing  neat,  plain,  and  legible  pays  extremely  well 
in  subsequent  saving  of  time.  Furthermore  neatness  in  the  arrange- 
ment of  the  record  is  important. 

4.  Permanence. — Original  records  should  be  made  on  (a)  a good 
quality  of  paper,  cut  in  sheets  of  uniform  size,  and  either  bound  in 
a book  at  the  start,  or,  if  loose,  the  sheets  should  be  bound  as  soon 
as  the  particular  set  of  records  is  completed;  or  (b)  on  card  forms, 
cut  to  uniform  size  from  good  stock.  For  many  years  it  has  been 
the  practice  in  the  writer’s  laboratory  to  use  uniform  paper,  of  a 
standard  size,  ruled  to  suit  our  requirements,  for  all  records,  com- 
putations, and  preparation  of  manuscript.  At  the  end  of  a par- 
ticular study,  after  the  results  have  been  published,  all  of  the  papers 
connected  with  it  are  bound  neatly  together  in  heavy  board  covers 
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by  the  laboratory  Diener  at  a cost  of  from  15  to  25  cents  a volume, 
and  filed  for  permanent  record. 

Original  records  should  be  made  in  ink  wherever  possible.  It  is 
recognized  that  this  is  a counsel  of  perfection.  Some  people  are 
bound  to  use  pencils.  If  the  pencil  urge  is  uncontrollable,  it  should 
be  satisfied  with  either  indelible  pencils,  or  ordinary  pencils  with 
hard  leads.  Any  other  sort  will,  in  time,  lead  to  blurred  and 
illegible  records. 

5.  Comprehensiveness. — Nothing  is  more  annoying  in  working 
with  statistical  records,  or  indeed  any  other  kind  of  records,  than 
to  find  no  statement  whatever  made  about  some  particular  point, 
which  certainly  was  observed  at  the  time.  Such  omissions  arise 
chiefly  in  one  or  another  of  three  ways : (a)  The  point  was  observed, 
but  through  carelessness  was  not  recorded  in  every  case;  ( b ) the  point 
was  observed  to  be  “normal”  in  some  cases,  and  on  that  account 
thought  not  worth  recording;  (c)  in  planning  the  investigation  no 
place  was  made  in  the  scheme  for  observing  the  point  at  all.  Such 
gaps  in  the  records  may  be  avoided  by  two  relatively  simple  means. 
The  first  is  to  plan  the  investigation  in  advance  with  sufficient 
care  to  ensure  that  all  pertinent  data,  so  far  as  it  is  possible  to 
envisage  them  in  the  then  existing  state  of  knowledge,  shall  be 
included  in  the  plan  of  the  records.  The  second  is  to  make  it  an 
unfailing  rule  to  record  something  regarding  every  item  in  the 
record  plan  in  every  case.  This  “something”  may  be  a positive 
ora  negative  finding;  the  situation  may  be  “normal”  or  interestingly 
abnormal;  but  in  any  case  something  about  it  should  go  into  the 
record.  This  matter  of  comprehensiveness  of  records  will  be  dis- 
cussed further  in  a later  section  of  this  chapter. 

6.  Minimal  Errors  of  Personal  Equation. — It  is  a well-established 
fact  that  observations  are  influenced  by  the  unconscious  bias  or  so- 
called  “personal  equation”  of  the  observer.  In  astronomy,  and  to 
some  extent  in  the  other  physical  sciences,  careful  attention  is  paid 
to  personal  equation.  Very  little,  though,  is  given  to  it  in  the  bio- 
logical sciences  or  medicine.  It  can,  however,  lead  to  considerable 
errors,  greater  in  magnitude  indeed  than  the  errors  of  random  sam- 
pling, to  which  the  statistician  pays  so  much  attention.  The  student 
should  read  the  important  papers  of  Pearson1  and  Yule2  on  the 
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subject.  An  example  may  be  given  briefly  here  to  indicate  how 
divergent  the  results  of  thoroughly  trained,  competent  biologists 
may  be,  when  observing  identical  things.  Further  details  regard- 
ing the  experimental  results  may  be  found  in  the  original  publica- 
tion.3 

When  a yellow  starchy  (flint)  variety  of  maize  was  crossbred 
with  a white  sweet  (sugar)  variety  there  was  produced  in  the  first 
generation  uniformly  yellow  starchy  progeny,  in  accordance  with 
Mendel’s  law  of  dominance.  When  these  first  generation  crossbred 
kernels  were  planted  they  gave  rise  to  a second  crossbred  genera- 
tion in  which  each  ear  bore  four  different  kinds  of  kernels,  in 
approximately  the  following  proportions:  9 yellow  starchy;  3 
yellow  sweet;  3 white  starchy;  1 white  sweet. 

Fifteen  trained  observers  were  asked  each  to  sort  into  these  four 
classes  and  count  independently  the  kernels  on  each  of  a number  of 
second  generation  ears.  The  results  of  the  count  on  one  such  ear 
are  shown  in  Table  9.  The  fifteen  observers  included  two  plant 
pathologists,  two  professors  of  agronomy,  one  professor  of  phil- 
osophy (originally  trained  as  a biologist),  four  biologists,  one  com- 
puter, one  practical  corn  breeder,  and  one  professor  and  three 
assistants  in  plant  physiology.  The  following  remarks  about  the 
group  are  pertinent  in  judging  the  results. 

“In  the  first  place  it  is  obvious  that  any  one  of  them  (with  the  possible  exception 
of  X)  might  in  the  ordinary  course  of  his  work  carry  on  a Mendelian  experiment  with 
maize,  either  independently  or  in  co-operation  with  someone  else.  If  this  were  done 
and  the  results  published  they  would  certainly  be  accepted  by  the  biological  public 
as  a precise  and  true  statement  of  the  facts  regarding  the  material  which  was  in  the 
experimenter’s  hands.  That  is,  if  any  worker  in  this  list  published  a statement  that  a 
Mendelian  experiment  which  he  had  conducted  with  maize  led  to  a ratio  of,  for  ex- 
ample 759  : 234  : 252  : 90  this  statement  would  not  be  doubted  or  questioned.  In  the 
second  place  it  is  worth  while  to  consider  the  training,  or  lines  of  work  with  which 
these  15  observers  have  had  to  do.  Of  six  (Nos.  I,  II,  XI,  XII,  XIII,  XIV)  the  train- 
ing and  work  has  been  primarily  botanical.  Four  of  these  (the  Danish  group,  Nos. 
XI  to  XIV  inclusive)  have  had  particularly  to  do  with  the  data  of  experimental  plant 
breeding,  in  connection  with  the  brilliant  and  fundamental  researches  of  Professor 
Johannsen.  The  training  and  special  field  of  work  of  five  (Nos.  V,  VI,  VII,  VIII, 
and  XV)  of  the  observers  has  been  zoological.  Of  these  five  three  (Nos.  VI,  VII,  and 
VIII)  have  had  experience  with  the  data  and  methods  of  investigation  in  experimental 
breeding.  Another  of  the  five  (No.  V)  adds  to  the  special  training  of  the  zoologist 
that  of  philosopher  and  psychologist,  which  by  traditional  standards,  at  least,  ought 
to  aid  in  the  development  of  a discriminative  judgment.  The  training  of  two  of^the 
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observers  (Nos.  Ill  and  IV)  has  been  agricultural.  Further,  both  of  these  men  belong 
by  birth,  early  life,  and  education  to  the  “corn  belt”  section  of  the  country,  and 
are  thoroughly  and  intimately  familiar  with  maize.  They  have  had  experience  in 
corn  judging,  which  demands  the  appreciation  of  very  small  differences  in  ear  char- 
acters. Observer  No.  X,  while  not  a scientific  student  of  breeding,  has  had  successful 
experience  in  corn  breeding,  and  is  a careful  observer.  Observer  No.  IX  has  been 
especially  trained  in  biometric  work  in  the  writer’s  laboratory  and  has  had  consider" 
able  experience  in  measuring,  sorting  small  variations  out  of  mixed  material,  and 
similar  work.” 

TABLE  9 

Showing  the  Classification  of  the  Kernels  of  Ear  No.  8 by  the  Different 

Observers 


Classes  of  Kernels. 


Observer. 

Yellow 

starchy. 

Yellow 

sweet. 

White 

starchy. 

White 

sweet. 

Total 

starchy. 

Total 

sweet. 

Mendelian  Ex- 
pectation. 

299.25 

99.75 

99.75 

33.25 

399.00 

133.00 

I. 

352 

102 

52 

26 

404 

128 

II. 

322 

49 

82 

79 

404 

128 

III. 

298 

75 

108 

51 

406 

126 

IV. 

332 

101 

71 

28 

403 

129 

V. 

305 

101 

86 

40 

391 

141 

VI. 

313 

100 

90 

29 

403 

129 

VII. 

308 

86 

95 

43 

403 

129 

VIII. 

311 

101 

92 

28 

403 

129 

IX. 

327 

101 

78 

26 

405 

127 

X. 

308 

92 

95 

37 

403 

129 

XI. 

311 

97 

92 

32 

403 

129 

XII. 

313 

99 

91 

29 

404 

128 

XIII. 

308 

97 

95 

32 

403 

129 

XIV. 

312 

104 

91 

25 

403 

129 

XV. 

« 

333 

97 

73 

29 

406 

126 

Totals. 

4753 

1402 

1291 

534 

6044 

1936 

Means. 

316.87 

93.47 

86.07 

35.60 

402 . 93 

129.07 

The  results  shown  in  Table  9 and  Fig.  13  seem  to  have  real 
significance  relative  to  the  methodology  of  biology  in  general, 
apart  from  specifically  genetic  problems.  It  must  be  remembered 
that  each  individual  handled,  sorted,  and  counted  the  same  identical 
kernels  of  corn.  They  were  required  to  discriminate  only  with  refer- 
ence to  the  color  and  the  form  of  each  kernel.  Yet  no  two  of  the 
fifteen  highly  trained  and  competent  observers  agreed  as  to  the  dis- 
tribution of  these  532  kernels.  When  it  is  recalled  that  pathologists, 
clinicians,  and  anthropologists  have  to  make  fine  distinctions 
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relative  to  color  and  form  regularly  in  the  course  of  their  work,  the 
thought  suggests  itself  that  perhaps  their  records  of  observation  on 
man,  a more  complicated  entity  than  a maize  kernel,  may  not  have 
that  absolute  and  ultimate  verity  that  some  naive  persons  perhaps 
suppose  they  have. 


Fig.  13. — Diagram  showing  the  counts  for  ear  No.  8 by  each  of  the  different 
observers.  The  horizontal  dotted  line  gives  the  Mendelian  expectation,  and  the 
horizontal  dash  line  the  average  of  the  counts  of  all  15  observers. 


It  should  be  recognized  as  a general  principle,  and  kept  always 
in  mind,  in  measuring  and  recording,  that  every  individual  has  bias 
or  “personal  equation”  in  his  observing  and  measuring.  There  is 
no  way  completely  to  eliminate  its  effects.  The  most  that  can  be 
done  is  to  minimize  them.  The  first  step  toward  this  is  for  an 
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individual  to  find  out  by  preliminary  observation  approximately 
what  the  trend  and  amount  of  his  personal  equation  is  relative  to 
the  particular  thing  being  measured  or  observed.  Then,  at  least, 
he  knows  where  he  needs  guarding  against  himself,  and  can  make 
allowances  and  use  extra  care. 

Space  cannot  be  spared  for  further  discussion  of  the  matter 
here.  But  before  leaving  it  entirely  the  student  who  is  interested 
in  philosophy  or  metaphysics  is  asked  to  contemplate  Table  9 
from  the  point  of  view  of  the  statistical  method  of  acquiring  knowl- 
edge. It  may  fairly  be  said  that  Ear  No.  8 carried  532  kernels. 
The  testimony  of  fifteen  independent  witnesses  agrees  to  this. 
Perhaps  with  as  great  warrant  as  is  ever  attainable  we  may  say 
that  we  know  that  Ear  No.  8 had  532  kernels.  But  how  many 
white  starchy  kernels  did  it  have?  I mean  how  many  did  it  really 
have?  There  must  have  been  some  determinate  number  because  it 
is  certainly  known  that  some  of  the  532  kernels  were  white  starchy. 
But  how  many?  It  seems  a simple  problem.  One  only  has  to  count 
them.  They  do  not  run  away  or  change.  But  still  I should  like 
to  know  how  many  of  them  there  were  on  this  ear.  And  still  more 
I should  like  to  know  some  method  by  which  definite  and  certain 
knowledge  on  the  point  could  possibly  be  obtained , by  the  use  only  of 
visual  observation  of  the  kernels  themselves  and  the  process  of 
counting.  Examine  the  fourth  column  of  Table  9 and  think  it  over. 

7.  Purposeful  Adaptation — Original  record  forms  should  be 
carefully  planned  in  advance  so  that  the  orderly  arrangement  of 
the  individual  items  will  most  effectively  conduce  to  speed  and 
accuracy,  first  in  the  recording  of  the  original  observations,  and 
second,  in  their  subsequent  tabulation.  For  example,  in  planning 
a blank  form  for  recording  anthropometric  measurements  all  those 
measurements  which  are  taken  with  one  instrument,  for  instance 
the  heights  measured  with  the  anthropometer,  may  conveniently 
follow  each  other  consecutively  in  one  group. 

8.  Inclusiveness. — All  observations  made  should  be  included  in 
the  original  records,  as  they  are  made.  If  some  particular  observa- 
tions are  suspected  of  being  bad,  a note  should  be  made  saying  so. 
But  they  should  not  be  thrown  away.  To  do  so  is  to  implant 
permanently  in  the  observer’s  hitherto  pure  mind  a horrid  (however 
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small)  conviction  of  the  sin  of  picking  and  choosing  only  favorable 
cases.  No  honest  man  would,  of  course,  be  guilty  of  intention- 
ally doing  this.  But  the  only  way  to  avoid  it  is  never  to  take 
the  first  step.  Put  down  on  the  record  everything  that  Nature 
offers.  Later  on  the  record  can  be  looked  over  and  studied,  and  a 
calm  and  reasoned  attempt  can  be  made  to  see  what  it  all  means. 
But  if  part  of  what  was  actually  observed  has  been  omitted  from  the 
record  nothing  further  can  ever  be  honestly  done  about  that  fact. 

9.  Absence  of  Ambiguity.- — A record  which  is  capable  of  being 
read  in  either  of  two  ways  is  a thorn  in  the  scientific  flesh.  Unless 
care  is  given  to  the  point  such  records  turn  up  with  curious  fre- 
quency. Let  an  example  point  the  precept.  It  is  the  prevailing 
custom  in  America  to  write  dates  in  the  form  April  2,  1930.  It  is 
a common  custom  in  Europe  to  write  dates  in  the  form  2 April, 
1930.  In  both  America  and  Europe  scientific  men  have  a habit 
of  using  a numerical  shorthand  in  dating  their  records.  But  who 
is  to  be  sure  whether  a record  of  4/2/1930,  considered  as  record, 
means  April  2,  1930,  or  February  4,  1930?  It  has  long  been  a rule 
of  the  writer’s  laboratory  that  all  dates  should  be  of  the  form 
Apr.  2,  1930,  as  a maximum  concession  to  the  shorthand  urge. 
But  in  going  over  our  old  records  it  is  amazing  to  see  how  many 
have  risen  superior  to  any  such  attempted  restraint  upon  free  self- 
expression.  And  of  what  conceivable  use  is  a date  record  4/2  say 
n years  after  the  actual  year  of  its  making?  Other  examples  of 
ambiguous  records  might  be  given. 

MEDICAL  RECORDS 

It  will  perhaps  be  useful  at  this  point  to  transfer  the  discussion 
from  the  general  to  the  specific,  and  consider  medical  records  as 
examples  pertinent  to  the  interest  of  the  class  of  readers  for  which 
this  book  is  especially  intended. 

The  fundamental  medical  record  is  the  individual  case  history. 
Upon  it  depends  any  and  all  useful  information,  whether  statistical 
or  otherwise  in  character,  which  may  be  wanted  for  any  purpose 
whatever.  It  is,  therefore,  of  the  highest  importance  that  case 
histories  conform  to  the  best  standards  of  scientific  record  making, 
on  the  one  hand,  and  of  modern  business  office  practice  on  the  other 
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hand.  There  seem  to  be  relatively  few  hospitals  where  the  highest 
standards  in  either  of  these  respects  are  even  approximated. 

From  the  standpoint  of  scientific  record  taking,  case  histories 
are  most  often  defective  in  what  they  fail  to  record  about  the 
patient.  It  is  by  no  means  impossible  to  find  case  histories  that 
fail  to  record  the  sex  of  the  patient;  while  any  indication  of  what 
kind  of  person  he  was,  in  the  common  sense  of  the  word,  whether 
fat  or  lean,  white  or  colored,  rich  or  poor,  young  or  old,  etc.,  is  all 
too  frequently  kept  a deep  secret  from  any  subsequent  reader  of  the 
history.  Again,  even  in  the  special  medical  portions  of  the  history 
the  writer  forgets,  with  almost  unbelievable  frequency,  to  make 
any  record  of  highly  important  facts. 

The  root  of  the  difficulty  apparently  lies  in  the  method  by  which 
case  histories  are  written.  The  general  scheme  or  outline  which  a 
history  is  to  follow  seems  often  to  reside  in  the  head  of  the  particular 
writer,  and  there  only.  And  heads,  especially  of  human  beings,  do 
vary  so!  There  is  a simple  procedure  which  will  help  to  remedy  the 
difficulty.  It  is,  as  a first  step,  to  draw  up  and  have  printed  a 
series  of  standard  history  forms,  which  will  cover  not  merely  general 
routine  facts  common  to  all  diseased  conditions,  but  special  forms 
as  well,  for  at  least  all  of  the  more  frequently  occurring  conditions. 
These  blank  forms  will  contain  definitely  indicated  spaces  in  which 
some  statement  of  fact,  either  positive  or  negative,  absolutely  must 
be  recorded  in  every  single  case.  If  on  the  case  record  form  for  gall- 
stone cases,  for  example,  there  is  printed  the  question,  “Did  this 
patient  ever  have  typhoid?”  or  the  equivalent  of  this  question,  one 
or  another  of  three  answers  may  be  definitely  recorded,  either 
“yes,”  or  “no,”  or  “nobody  knows.”  If,  furthermore,  every  worker 
in  the  service  clearly  understands  that  any  history  for  which  he  is 
responsible  that  comes  into  the  history  department,  with  any  blank 
spaces  in  its  standardized  portion,  will  not  be  accepted  for  filing, 
but  will  be  forthwith  returned  to  him  for  completion,  future  students 
will  be  able  to  compile  comprehensively  and  definitely  the  teach- 
ings of  the  experience  of  that  hospital  relative  to  the  etiologic  rela- 
tions between  typhoid  and  biliary  calculi. 

It  is,  of  course,  to  be  understood  that  no  blank  form,  however 
carefully  it  may  be  devised,  can  ever  suffice  for  the  recording  of  the 
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whole  history.  There  must  be  some  portions  written  or  dictated 
with  entire  freedom  from  Procrustean  rigidities.  The  reason  why 
this  is  so  is  plain.  One  of  the  chief  characteristics  of  living  things, 
whether  men  or  mice,  is  that  they  vary  individually.  But  formal 
blanks  do  not  vary.  An  invariable  phenomenon  cannot  accommo- 
date itself  to  a variable  one.  But  this  is  no  valid  argument  against 
having  certain  essential  parts  of  the  history  recorded  in  standardized 
form.  There  are  some  facts  that  everyone  will  agree  ought  to  form 
a part  of  every  case  history  which  is  to  be  permanently  preserved. 
It  is  that  class  of  facts  which  should  be  recorded  upon  standardized 
formalized  sheet  or  sheets  incorporated  into  each  history.  Then,  in 
addition , the  clinician  may  write  or  dictate  as  much  more  as  he  likes 
in  an  entirely  free  untrammeled  style.  The  formalized  portion 
merely  serves  as  the  schema  of  the  whole,  to  make  sure  that  no  point 
of  importance  for  future  students  is  left  out,  because  forgotten,  in 
the  greater  present  interest  of  other  more  immediately  exciting 
features  of  the  case. 

It  is  particularly  important  that  a definite  statement  or  record 
be  made  that  a structure  or  function  is  normal  when  it  is  so.  In  the 
minds  of  many  persons,  perhaps  particularly  in  the  field  of  medicine, 
there  has  grown  up  the  notion  that  what  is  normal  is  of  no  interest 
and,  therefore,  nothing  needs  to  be  said  about  it  in  the  record. 
Later  on  someone  comes  to  study  the  record.  Let  us  say,  to  take 
a concrete  example,  that  this  subsequent  student  wants  to  know 
definitely  whether  the  tonsils  in  this  particular  case  were  diseased 
or  not.  No  mention  of  tonsils  can  be  found.  Two  alternatives 
then  present  themselves  to  the  second  student: 

1.  The  tonsils  were  not  diseased,  and  on  that  account  the 

original  recorder  said  nothing  about  them. 

2.  The  original  recorder  forgot  to  look  at  the  tonsils  or  forgot  to 

make  a record  of  his  findings. 

Either  horn  of  the  dilemma  is  equally  unfortunate.  uNo  informa- 
tion” is  the  sad,  but  only  possible,  conclusion  which  can  be  regarded 
as  accurate. 

EXAMPLES  OF  BLANK  FORMS 

In  this  section  it  is  proposed  to  give  some  examples  of  record 
forms,  which  have  been  successfully  and  satisfactorily  used  in 
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actual  investigations  over  a number  of  years.  These  will  help  to 
illustrate  some  of  the  general  principles  which  have  been  discussed 
above.  It  should  be  understood  that  these  examples  are  not  put 
forward  as  models.  Perfection  is  rarely  attained  in  this  rapidly 
moving  world,  and  almost  never  by  statisticians.  The  examples  of 
blank  forms  here  presented  are  open  to  criticism  from  a number  of 
viewpoints.  But,  even  with  their  recognized  imperfections,  they 
have  worked  well  in  practice.  Every  time  a new  batch  of  these 
blanks  is  printed  some  defects  which  experience  has  brought  to 
light  are  corrected,  and  some  new  items  or  improvements  added. 
This  somewhat  easy-going  attitude  represents  the  sum-total  of  our 
striving  toward  absolute  perfection  in  record  forms. 

If  any  reader  should  wish  to  make  trial  of  such  record  forms  as 
are  here  illustrated,  he  should  not  copy  slavishly  these  particular 
forms.  Instead  he  should  draw  up  his  own,  designing  them  to  meet 
specifically  his  particular  needs,  just  as  these  were  drawn  up  to 
meet  our  special  requirements.  The  examples  given  here  can,  and 
should,  at  best,  serve  only  as  suggestions. 

In  Figs.  14-21  inclusive  are  a part  of  the  forms  used  in  the 
investigation  of  constitutional  factors  in  disease  which  has  been  in 
progress  for  the  past  live  years  in  the  Institute  for  Biological  Re- 
search of  the  Johns  Hopkins  University.4  These  eight  forms  have 
to  do  with  the  personal  and  medical  history,  and  the  physical  ex- 
amination, of  a patient  being  studied  constitutionally.  The  prob- 
lem of  primary  interest  in  our  constitutional  work  has  been  that 
presented  by  the  clinical  picture  commonly  called  “essential  hyper- 
tension,” and  the  blanks  have  all  been  constructed  with  that  prob- 
lem in  mind.  This  will  account  for  some  of  the  obvious  omissions, 
were  these  to  be  regarded  as  general  medical  history  blanks. 

All  the  forms  are  printed  on  sheets  11  x 8.5  inches,  of  a good 
grade  of  bond  paper.  A binding  margin,  1 inch  wide,  is  left  on  the 
left-hand  long  side  of  each  sheet.  All  forms  are  printed  on  one  side 
of  the  paper  only,  leaving  the  back  available  for  additional  notes. 

Forms  A 1-8  are  used  exclusively  for  data  pertaining  to  the 
patient  (and  when  possible  to  other  members  of  the  family  who 
consent  to  have  complete  examinations  made),  and  are  filled  in  by 
the  examining  physician,  who  is  especially  trained  in  the  technic. 
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In  planning  the  investigation  it  was  constantly  borne  in  mind 
that  the  data  collected  should  comply  with  the  following  criteria. 


Family  Name: 


Date 


Constitutional  Form  A-l 

Family  No. 


THE  INDIVIDUAL  RECORD 


Family  assigned  to 

Disp.  Hist.  No.._ 
Hosp.  Hist.  No.- 

Name  (give  name  in  full.  In  case  of  married  women  give  maiden  name  also) 


Address 

Number  Street  City  or  Town  State 

Sex:  M.  F. Social  status:  S.  M.  W.  D Color:  W.  B 

Race:  (specify,  using  code  on  A-04) .. 

Date  of  birth:  Day Month Year.... . Age  now «, .... 

Date  of  marriage:  Day..  Month Year Age  at  marriage 

Born  in:  City ....Province Country 

Came  to  this  country:  Year._ 

In  what  places  has  person  resided  during  life?  all  mostly  country  city 

(Specify  places) 


Occupational  History: 

What.occupations  has  patient  followed  during  life?  Give  dates  as  accurately  as  possible. 


To  what  extent  has  work  involved  hard  manual  labor? 


CLINICAL  HISTORY 


Complaint 

Present  illness  (story  in  patient’s  own  words)  Began  in:  Month. 


Age  at  onset  (years) Stopped  work:  Month.. 

Symptoms  and  condition : , 


..Year . 
..Year.. 


Past  History: 

General  health : (beforeP.  I.)  Very  good  (never  ill) ; good  (minor  ailments  only) ; fair  (average  amount 

of  sickness) ; poor  (frequently  sick);  very  poor  (an  invalid  throughout  life). 


Fig.  14. — Constitutional  Form  A-l  in  reduced  facsimile. 


The  questions  asked  must  be  (1)  strictly  relevant  to  the  purpose  of 
the  investigation;  (2)  inclusive,  i.  e.,  must  apply  to  all  individuals, 
allowing  for  differences  of  sex  and  age;  (3)  systematic,  i.  e.,  grouped 
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in  logical  order;  (4)  specific,  i.  e.,  definite  and  unambiguous;  (5) 
comprehensive,  i.  e.,  covering  the  fields  the  investigators  were  cap- 


Constitutional  Form  A-2 

Family  Name:  Family  No. 

Name  of  Person: 

Operations:  (specify  kinds  and  dates  of  each).. 


HOSPITAL  .ADMISSIONS 


Name  of  Hospital 

Month 

Year 

Diagnosis 

Length  of  Stay 

* 

Infections  and  Diseases 

No.  of  Attacks 

Complications 

(give  dates  or 

ages  if  possible) 

Measles 

Mumps 

Whooping  cough 

Scarlet  fever 

Diphtheria 

Tonsillitis 

Rheumatic  fever 

Chorea 

Typhoid  fever 

Pneumonia 

Pleurisy.. 

Bronchitis 

Influenza 

Tuberculosis 

Heart  trouble 

Bright’s  disease 

High  blood  pressure 


1st  discovered  (give  date  and  physician) 


Hair:  is  not  turning  grey,  is  turning  grey,  completely  grey 

age  when  began  to  turn (yrs.),  finished  turning (yrs.) 

Headaches:  has,  does  not  have,  mild,  moderate,  severe,  frontal,  occipital,  unilateral,  entire  head,  associated 
with  nausea,  not  associated  with  nausea,  scotomata  present,  absent.  Frequency:  daily, 
weekljr,  monthly,  yearly. 

Eyes:  Glasses  not  worn,  worn  (specify  physician  or  where  purchased) 


Near  sighted,  far  sighted,  astigmatism.  Night  blindness: 
Reads  newspaper  fine  print  without  glasses.  Rt. 

yes 

yes 

no 

no 

Lt. 

yes 

no 

. 

Reads  newspaper  fine  print  only  with  glasses. 

Rt. 

yes 

no 

Lt. 

yes 

no 

Unable  to  read  fine  print  even  with  glasses. 

Rt. 

yes 

no 

Lt. 

yes 

no 

Unable  to  recognize  friend  across  street. 

Rt. 

yes 

no 

Lt. 

yes 

no 

Ears:  Hearing  normal.  History  of  ear  disease  (specify) 

Nose:  Breathes  freely  through  left  side,  mouth  closed.  Yes  no 

Breathes  freely  through  right  side,  mouth  closed.  Yes  no 

Snores:  Yes  no.  Sleeps  with  mouth  open,  shut. 

History  of  disease  of  nose  and  throat. 


Fig.  15. — Constitutional  Form  A-2  in  reduced  facsimile. 


able  of  exploring;  (6)  objective,  i.  e.,  avoiding  individual  bias  as 
far  as  possible;  and  (7)  conveniently  arranged  for  recording  and 
analyzing. 


MEDICAL  BIOMETRY  AND  STATISTICS 


The  forms  set  up  for  recording  the  data  cover  with  reasonable 
completeness,  subject  to  the  implications  of  criterion  5,  all  fields 
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Constitution*)  Form  A -3 


Family  Name:  Family  No. 

Name  of  Person: 

Teeth:  No.  extracted  1,  2,  3,  4,  5,  6,  7.  Why?  pyorrhoea,  abscess,  toothache. 

Gums  bleed  occasionally,  frequently. 

Goitre:  No,  yes  (specify) 

Cardio  Resp:  Continually  clearing  throat,  chronic  cough,  morning  cough 

Expectoration : none,  occasional,  frequent ; hard  to  raise,  easy  to  raise,  white,  grey,  yellow, 
blood  streaked  1,  2,  3,  4,  5,  6 times,  large  amount  of  blood  1,  2,  3,  4,  5 times 

Severe  pain  on  breathing:  rt.  side,  It.  side 

Night  sweats:  occasional,  nightly 

Shortness  of  breath:  none,  running  for  a street  car,  walking  up  one  flight  of  steps,  walking 
along  level  street,  sudden  attacks  requiring  pt.  to  sit  up  in  bed 

Consciousness  of  rapid  beating  of  heart:  none,  with  excitement,  after  eating,  with  exercise, 
sudden  onset  while  sitting  still,  sudden  onset  while  lying  in  bed 

Does  pt,  have  attacks  of  pain  in  chest  (over  heart  or  in  It.  arm)  which  come  on  suddenly: 
with,  without  exercise,  causing  fear  of  death,  or  cessation  of  all  activity  for  the  moment? 
Yes,  no,  occasional,  frequent,  daily. 

Other  history  of  heart  or  lung  diseases:  (specify) 

Gastro-intestihali  Eats:  heartily,  average,  sparingly,  always  hungry  but  afraid  to  eat,  never  hun- 
gry 

Bowel  movements:  regular,  irregular,  1,  2,  3,  4,  5 daily,  weekly,  easy,  difficult,  loose 

Piles:  none,  itch,  protrude,  painful,  bleed. 

Abdominal  pain : none,  upper  half,  lower  half,  relieved  by  eating,  increased  by  eating,  dull, 
severe,  shooting 

Hernia:  no,  yes  (specify  kind) 

Other  history  of  abdominal  disease  (specify) 


Genito-urinary : Gets  up  at  night  to  pass  urine  0,  1,  2,  3,  4,  5 times 
Age  at  onset  of  nocturia  (yrs.) 

Gonorrhea:  attacks  0,  1,  2,  3,  age  at  first  attack  (yrs.)  Epididymitis,  prostatitis 

/ 

Syphilis:  no,  yes,  chancre,  secondary  rash,  hereditary,  other  manifestations:  (specify) 
Age  at  onset  (yrs.) 

Treatment:  (specify  and  state  physician  or  dispensary) 

Other  history  of  genito-urinary  disease 


Menstrual  History:  Age  at  onset  (yrs.)  Painful,  painless,  profuse,  average,  scanty 
Regular:  interval  days,  duration  days 

Irregular:  usual  interval  days;  shortest  interval  days:  longest  interval  (unassoc.  with 
pregrfancy)  mos.  duration  days 

Completely  irregular : shortest  interval  days,  longesf  days. 

Duration:  usual  days;  shortest  days;  longest  days. 

Last  time : month  year 


Fig.  16. — Constitutional  Form  A-3  in  reduced  facsimile. 


of  general  medicine.  They  do  not  include  such  specialties  as  psy- 
chiatry, obstetrics,  and  genito-urinary  diseases  of  males  and 
females,  because  these  and  certain  other  branches  of  medicine  are 
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at  present  outside  the  scope  of  our  investigations;  but  for  them 
supplementary  forms  along  similar  lines  could  readily  be  constructed, 


Constitutional  Form  A-4 

Family  Name:  Family  No. 

Name  of  Person: 

Marital  History:  Number  of  pregnancies  (or  wife’s)  1,  2,  3,  4,  5,  6.  7,  8,  9,  10.  11,  12,  13,  14 


Number  of  miscarriages 

1 

2 

3 

4 

5 

6 

7 

8 

Month  of  $re£nancy 

Number  of  premature  living  born 

1 

2 

3 

4 

5 

6 

7 

8 

Month  of  pregnancy 

dumber  of  term  living  born  1,  2,  3 ,4,  5,  6,  7,  8.  9.  10,  11,  12,  13,  14 
Number  of  born  dead- 1,  2,  3,  4,  5,  6 


Se'xual  Habits:  Frequency  1,  2,  3,  4,  5,  6,  7 per  week  month  year 

Birth  Control:  Not  used  used  (specify  methods) 


Allergic  History:  Hay  fever,  asthma,  angioneurotic  oedema,  urticaria,  food,  other  protein,  other 
symptoms:  (specify) 

Skin:  (History  of  any  skin  disease) 

Congenital  malformations,  tumors,  etc.  (specify) 

Bleeding:  (History  suggestive  of  hemophilia) : No  yes  (specify) 

Nervous  system:  Frequency  of  attacks  of  dizziness:  none,  daily,  weekly,  monthly,  yearly 
Patient  has  fallen  in  attacks,  has  not  fallen 

Frequency  of  convulsions : General : none,  daily,  weekly,  monthly,  yearly 

Localized:  none,  daily,  weekly,  monthly,  yearly 
Other  history  of  nervous  disorders:  (specify) 


Locomotor  System:  History  of  disease  of  joints,  difficulty  in  walking,  etc.  (specify) 


Weight:  Has  there  ever  been  any  rapid  change  in  weight  during  adult  life? 

Gained  lbs.  yr.  Why? 

Lost  lbs.  yr‘.  Why? 

Habits:  Sleeping:  Average  hours  5,  6,  7,  8,  9,  10  sleep  unbroken,  broken,  1,  2,  3,  4,  5 times 

Why?  Awakens  rested,  tired. 

Alcohol  used:  no,  yes  (get  as  full  details  as  possible  throughout  life) 

Wine?  Beer? 

Whiskey  or  other  spirits? 

Tobacco  used:  no,  yes  (get  as  full  details  as  possible  throughout  life) 

Pipe  ? Cigars  ? 

Cigarettes  ? Chewing  ? 

Snuff? 


Fig.  17. — Constitutional  Form  A-4  in  reduced  facsimile. 


and  in  certain  cases,  notably  in  the  study  of  Friedreich’s  ataxia,  we 
have  made  such  supplementary  record  forms. 

Forms  A 1-4,  shown  in  Figs.  14-17  inclusive,  deal  with  the  essen- 
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tial  facts  of  the  patient's  personal  and  medical  history , as  obtained  by 
direct  questions  asked  by  the  physician.  The  questions  are  simple 


Constitutional  Form  A-5 

Family  No. 


PHYSICAL  EXAMINATION 

Date 

Body  temperature:  Hour  when  taken  Examined  by 

Head:  Mucous  membrane;  normal  color,  pale,  cyanotic 

Eyes:  deep  set.  average,  prominent,  very  prominent.  Upper  lids:  normal,  puffy. 

Lower  lids:  normal,  puffy,  very  puffy.  Arcus  senilis:  none,  slight,  marked. 

Pupils:  regular,  irregular,  equal,  unequal,  It.,  larger,  rt.  larger,  react  to  light,  react  on  accomo- 
dation. 

E.  O.  M. : normal,  abnormal  (specify) 

j Nose  and  throat:  Nasal  obstruction:  none,  moderate,  marked,  left,  right. 

Tonsils  present,  absent,  normal,  obviously  diseased. 

Teeth:  gums  and  mouth:  pyorrhoea,  none,  mild,  moderate,  severe,  other  abnormality  (specify) 


! Family  Name: 

| Name  of  Person: 


£ 

o 

£ 


fi 


Upper  teeth  Lower  teeth 

Right  Left  Right  Left 


(Strike  out  those  that  are  missing;  C- crown.  B - bridge.  F- filled.  G - nonvital  teeth.  H- 
obvious  cavities  unfilled.  R - roots  only.) 

Ears:  Hearing:  hears  watch  tick  Rt.  ear  yes  no  Lt.  ear  yes  no 

hears  normal  voice  Rt.  ear  yes  no  Lt.  ear  yes  no 

hears  loud  voice  Rt.  ear  yes  no  Lt.  ear  yes  no 

Discharge:  Lt.  ear  yes  no.  Rt.  ear  yes  no 


Neck:  Thyroid  is:  not  felt,  isthmus  palpable,  lobe  palpable,  right,  left,  smooth,  nodular,  markedly 
enlarged. 

Trachea  is:  in  midline,  deviated  to  left,  right 


Lymph  nodes:  Angles  of  jaw: 

Cervical : 
Epitrochlears : 
Axillary: 
Inguinal: 


small,  large,  tender,  soft,  firm,  not  felt 
small,  large,  tender,  soft,  firm,  not  felt 
small,  large,  tender,  soft,  firm,  not  felt 
small,  large,  tender,  soft,  firm,  not  felt 
small,  large,  tender,  soft,  firm,  not  felt 


Chest:  1.  General  description: 

Clavicles:  imdsible,  visible,  prominent 

Chest  shape : normal,  flat,  chicken  breasted,  barrel  shaped 

Harrison’s  groove:  present,  absent 

Costal  margin:  normal,  flaring 

Movement  of  chest  wall:  rt.  equals  It.,  rt.  more.  rt.  less 


Respiration  rate: 


Fig.  18. — Constitutional  Form  A-5  in  reduced  facsimile. 


and  direct,  but  comprehensive  withal,  and  are  restricted  to  cir- 
cumstances, events,  bodily  functions  and  variations,  symptoms  and 
diseases,  concerning  which  a person  of  ordinary  intelligence  can 


ORIGINAL  SCIENTIFIC  RECORDS— THEIR  TABULAR  FORM  1 39 

usually  give  reliable  answers,  or,  if  the  patient  is  a child,  which  the 
mother  can  answer.  They  differ  little  from  those  covered  in  the 


Con»tJtatlonal  Form  A-4 

Family  Name:  Family  No. 

Name  of  Person: 

2.  Lungs:  Distance  of  lung  bases  at  rest  below  7th  cerv.  spine  cm. 

if  not  at  equal  levels  rt.  cm.,  It.  cm. 

Condition  of  lungs  on  exam.:  normal,  abnormal  (specify  pathology) 


! 


3.  Heart  Dorsal,  upright 

P.  M.  I.,  seen,  not  seen;  localized,  diffuse,  weak,  forceful 
Distance  from  midline  cm.,  interspace  4,  5,  6,  7 ; 

Distance  from  suprasternal  notch  cm. 

Cardiac  dullness:  cm.  to  left  M.  S.  L. ; cm.  to  right  M.  S.  L. 

Heart  rhythm:  regular,  presystolic  gallop,  protodiastolic  gallop,  extrasystolic,  fi- 
brillating 

Murmurs:  none,  yes  (specify) 

4.  Pulse:  Dorsal,  upright.  Rate  Blood  pressure:  S.  D. 

Rhythm:  regular,  extrasystole,  nohe,  occasional,  frequent,  completely  irregular 

Arteries:  Temporal:  Vessel  wall  seen,  not  seen,  tortuous,  very  tortuous 
Radial : Rt.  pulse  equals  It.,  rt.  larger,  rt.  smaller. 

Vessel  walls:  not  felt,  felt,  soft,  hard,  diffuse  thickening,  nodular,  straight,  tor- 
tuous, very  tortuous 

Brachial:  vessel  walls  ( elbow  straight) : not  seen,  seen,  not  felt,  felt,  soft,  hard,  diffuse 
thickening,  nodular,  straight,  tortuous,  very  tortuous. 

Posterior  tibial:  Pulse  present,  absent,  rt.  equals  It.,  rt.  larger,  rt.  smaller 
Dorsalis  pedis:  Pulse  present,  absent,  rt.  equals  It.,  rt.  larger,  rt.  smaller 


s 

o 

Z 

© 

fi 


Abdomen:  Abdominal  walls:  fatty  layer:  thin,  medium,  thick.  Muscle  wall:  relaxed,  not  relaxed, 
flabby,  firm:  Muscle  spasm  localized  in  U.  R.  Q.,  U.  L.  Q.,  L.  R.  Q.,  L.  L.  Q. 

Liver:  not  felt,  felt  cm.  below  costal  margin,  smooth,  not  smooth 

Spleen:  not  felt,  felt  cm.  below  costal  margin,  smooth,  not  smooth 

Kidney  rt. : not  felt,  felt,  normal  size,  enlarged,  tender 

Kidney  It. : no't  felt,  felt,  normal  size,  enlarged,  tender 

Hernia:  no,  yes,  rt.,  It.,  direct,  indirect,  inguinal,  umbilical,  femoral 

Additional  Note: 


Reflexes:  Biceps: 

Knee  Kick: 
Ankle  Jerk: 
Abdominal : 
Romberg: 


Rt.  absent,  average,  diminished,  exaggerated. 
Lt.  absent,  average,  diminished,  exaggerated. 
Rt.  absent,  average,  diminished,  exaggerated. 
Lt.  absent,  average,  diminished,  exaggerated. 
Rt.  absent,  average,  diminished,  exaggerated. 
Lt.  absent,  average,  diminished,  exaggerated. 
Rt.  absent,  average,  diminished,  exaggerated. 
Lt.  absent,  average,  diminished,  exaggerated. 
Absent,  slight,  marked. 


Extremities:  Tremor  of  extended  fingers:  none,  fine,  coarse.  Additional  note: 


Malformations:  (specify) 


Fig.  19.— Constitutional  Form  A-6  in  reduced  facsimile. 

history  taken  in  a properly  conducted  office  consultation.  The  only 
important  distinction  lies  in  the  fact  that  in  this  work  all  the  data 
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are  recorded  for  every  patient.  We  permit  no  blank  spaces  or  missing 
records. 

Constitutional  Form  A-7 

Family  Name:  Family  No. 

Name  of  Person: 


LABORATORY  STUDIES 

Urine  Examination 


Date 

Appearance 

Reaction 

Sp.  gr. 

Alb. 

Sugar 

Microscopic 

Kidney  function  (phenolsdlphonephthalein  eliminated) 


Date 

During  first  hour 

During  second  hour 

Total 

Amount 

Per  cent 

Amount 

Per  cent 

Amount 

Per  cent 

Wasserman  Reaction  on  Blood  Serum  Spinal  Fluid  Examination 


Date 

Result 

Date 

Result 

Basal  metabolism : (give  date  and  result) 


Fig.  20. — Constitutional  Form  A-7  in  reduced  facsimile. 


Taking  up  the  forms  in  order,  it  will  be  noted  that  in  Form  A 1, 
stress  is  laid  first  on  the  general  conditions  of  the  patient’s  life,  as 
regards  urban  or  rural  residence,  the  kind  of  work  done  and  its 
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demands  on  energy  output,  and  second  on  his  present  illness.  If 
the  patient  has  ever  been  treated  before  in  the  dispensary  or  hos- 


Constitutional  Form  A-8 

Family  Name: 

Family  No. 

Name  of  Person: 

Date : 

EXAMINATION  OF  EYE  GROUNDS 

Right  eye:  normal,  abnormal  (specify  below) 

Arteries : Irregular  diameter 

Slight,  medium,  marked 

Reduced  diameter 

Slight,  medium,  marked 

Tortuous 

Slight,  medium,  marked 

Pulsating.  t 

Slight,  medium,  marked 

Increased  light  reflex 

Punctate,  uniform 

Translucency 

Diminished,  absent 

Veins:  Increased  diameter 

Slight,  medium,  marked 

Compressed  by  arteries 

Slight,  medium,  marked 

Retina:  Oedema 

Slight,  medium,  marked 

Exudate 

Slight,  medium,  marked 

Haemorrhages 

Fresh,  organized 

« 

a 

to 

Optic  disc:  (Describe  abnormality) 

2 

H 

c 

£ 

© 

Remarks: 

e 

a 

Left  eye : normal,  abnormal  (specify  below) 

Arteries : Irregular  diameter 

Slight,  medium,  marked 

Reduced  diameter 

Slight,  medium,  marked 

Tortuous 

Slight,  medium,  marked 

Pulsating 

Slight,  medium,  marked 

Increased  light  reflex 

Punctate,  uniform 

Translucency 

Diminished,  absent 

Veins : Increased  diameter 

Slight,  medium,  marked 

Compressed  by  arteries 

Slight,  medium,  marked 

Retina:  Oedema 

Slight,  medium,  marked 

Exudate 

Slight,  medium,  marked 

Haemorrhages 

Fresh,  organized 

Optic  disc:  (Describe  abnormality) 

Remarks : 

Fig.  21. — Constitutional  Form  A- 8 in  reduced  facsimile. 


pital  full  abstracts  are  made  on  other  intercalated  sheets  of  the 
symptoms,  objective  findings,  diagnosis,  operations,  etc.,  if  any, 
as  recorded  in  these  histories,  and  these  are  filed  with  the  examina- 
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tion  forms.  In  many  cases  there  are  elaborate  histories,  involving 
several  departments  and  extending  over  many  years. 

Form  A 2 covers  the  history  of  surgical  operations,  hospital 
admissions,  the  occurrence  of  various  common  infections,  and  some 
important  chronic  organic  diseases.  Gonorrhea  and  syphilis,  which 
evidently  deserve  greater  detail,  are  taken  up  under  the  genito- 
urinary system,  on  Form  A 3. 

The  remainder  of  Form  A 2 and  Forms  A 3 and  4 are  devoted  to 
the  head  (including  the  hair,  headaches,  vision,  hearing,  the  nose, 
the  teeth,  the  thyroid),  the  cardio-respiratory,  the  gastro-intestinal, 
and  the  genito-urinary  systems,  and  to  the  menstrual  history. 

On  Form  A 4 one  third  of  the  space  is  devoted  to  the  important 
and  usually  neglected  subject  of  sexual  history.  In  addition  to  the 
total  number  of  pregnancies,  provision  is  made  for  the  recording 
of  not  only  the  number  of  living  and  dead  born  full-term  children 
but  the  number  of  miscarriages  (including  abortions)  and  of  pre- 
mature living  born,  with  the  periods  of  gestation  by  months. 

The  frequency  of  sexual  intercourse,  by  week,  month  or  year 
(by  five-  or  ten-year  periods  of  life  when  possible),  and  the  use  or 
non-use  of  what  were,  or  were  believed  to  be,  contraceptive  methods 
(including  induced  abortions)  are  set  down  under  these  headings 
with  as  much  detail  as  it  is  possible  to  obtain. 

In  training  the  clinicians  who  take  the  histories  the  following 
memorandum  of  instructions  is  used,  relative  to  the  questions  re- 
garding the  reproductive  history.  It  is  included  here  as  an  example 
of  the  sort  of  instructions  which  should  be  worked  out  for  compli- 
cated or  difficult  points  in  any  large,  systematic,  record-taking 
enterprise. 


Further  knowledge  concerning  the  correlation  between  frequency  of  intercourse 
and  frequency  of  conception  is  urgently  needed.  Birth  control,  conscious  and  designed, 
is  affected  by  (a)  abstinence  in  varying  degree,  or  (b)  contraceptive  measures  of  various 
sorts.  To  these  must  be  added,  in  the  world  as  it  actually  exists,  the  causing  of  the 
expulsion  of  fertilized  ova,  or  induced  abortion.  The  nullifying  effects  of  these  meas- 
ures upon  the  consequences  of  frequent  sexual  intercourse  is  obvious. 

Various  diseases,  such  as  syphilis,  pneumonia,  influenza,  malaria,  variola,  by 
killing  or  causing  the  expulsion  of  fertilized  ova,  cut  down  the  proportion  of  live  births 
per  marriage.  Of  these  the  most  important  is  thought  to  be  syphilis.  However,  the 
connection  between  the  others — particularly  pneumonia  and  influenza — and  abortions 
and  miscarriages  is  to  be  inquired  into  carefully  in  each  case. 
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The  above  considerations  are  sufficient  to  indicate  the  importance  of  careful  and 
painstaking  efforts  to  obtain  full  information  on  these  two  questions  of  sexual  habits 
and  birth  control. 

In  regard  to  the  first  of  these,  detailed  information  concerning  the  actual  fre- 
quency of  sexual  intercourse  during  the  individual’s  entire  sexual  life  is  desired.  To 
record  once  or  twice  per  month  for  a man  or  a woman  aged  forty-five,  fifty  or  sixty 
years,  while  possibly  or  even  probably  correct  for  these  particular  ages,  fails  by  far  to 
give  the  true  history  of  this  physiological  activity  in  this  person.  The  other  ages 
are  of  equal  or  of  even  greater  importance. 

It  is  suggested  that  in  the  case  of  married  women  the  investigator  start  with 
the  present  time  and  then  trace  back  by  ten-year  periods  in  even  decades  (i.  e.,  20-29, 
30-39,  etc.)  to  the  beginning  of  married  life.  In  the  case  of  widows  the  questioning 
should  begin  with  the  period  immediately  previous  to  the  husband’s  death.  Multiple 
marriages  are  to  be  treated,  of  course,  on  the  same  plan  as  outlined  above.  In 
the  case  of  unmarried  women  who  acknowledge  having  been  pregnant,  begin  with  the 
particular  set  of  sexual  activities  leading  to  the  first  impregnation,  and  follow  the 
lead  backward  and  forward  from  that. 

The  natural  sequence  for  these  questions  (in  the  case  of  women)  is  that  of  the 
history  sheets — i.  e.:  after  the  information  regarding  pregnancies.  In  some  cases 
the  investigator  may  judge  that  it  is  best  to  postpone  this  part  of  the  history  until 
after  the  patient’s  confidence  has  been  gained  more  securely. 

The  question  of  birth  control  is  best  approached  indirectly  and  without  the  use 
of  this  term.  For  instance,  a woman  of  thirty-five,  married  fifteen  years,  childless  or 
with  one  or  two  pregnancies  only,  should  be  asked  first  to  what  she  attributes  her 
failure  to  have  more  children,  and  the  question  of  preventive  measures  will  follow 
naturally.  Similarly  a woman  of  many  pregnancies  may  be  asked  very  naturally  if 
she  had  never  been  advised  or  tempted  to  try  preventive  measures  to  avoid  such 
frequent  conceptions. 

Frequent  abortions  or  miscarriages,  particularly  in  the  absence  of  some  disease 
of  the  generative  organs,  or  of  those  previously  mentioned,  are  suggestive  of  contra- 
ception, and  the  woman  should  be  questioned  carefully  concerning  them,  especially  as 
to  whether  any  of  them  followed  taking  medicine  or  the  use  of  mechanical  measures. 

Failure  to  conceive  is  frequently,  though  by  no  means  always,  associated  with 
gonorrhea,  puerperal  salpingitis,  uterine  displacements,  etc. 

Men  are  to  be  questioned  with  the  same  tact,  but  with  a somewhat  wider  latitude, 
especially  in  regard  to  premarital  intercourse  and  extramarital  intercourse;  also  in 
regard  to  impotence. 

Checking  the  statements  of  man  and  wife  independently  is  of  great  importance. 

When  contraceptive  measures  are  acknowledged  to  have  been  used,  find  out 
exactly  and  record  what  they  were,  and  the  patient’s  opinion  of  their  effectiveness. 
This  applies,  of  course,  to  persons  of  both  sexes.  It  is  probable  that  with  women  the 
questions  of  frequency  of  sexual  intercourse  and  birth  control  will  be  approached  best 
in  connection  with  the  number  of  pregnancies,  or  with  failure  to  conceive,  as  the 
case  may  be. 

It  is  important  here,  as  throughout  the  examination,  to  impress  the  patient  with 
the  idea  that  these  questions  may  have  a bearing  upon  his  or  her  particular  illness. 


Inquiry  into  the  state  and  functioning  of  the  bodily  organs  con 
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eludes  with  affections  of  the  skin,  the  nervous,  and  the  locomotor 
systems. 

Anomalies  are  covered  under  the  headings  allergic  history,  con- 
genital malformations,  tumors,  and  bleeding. 

This  anamnesis  concludes  with  detailed  information  regarding 
gain  and  loss  of  weight,  duration  and  quality  of  sleep,  and  the  use  of 
tobacco  and  alcohol. 

The  physical  examination  of  the  patient  ( Forms  A 5 to  A 8 
inclusive)  covers  the  systematic  inquiry  into  the  normal  and 
pathological  anatomy  of  most  of  the  organs  and  other  structures 
(the  larynx  and  the  genito-urinary  organs  excepted)  which  are 
readily  susceptible  to  examination  in  a medical  clinic  by  inspec- 
tion, palpation,  percussion,  and  auscultation,  with  the  aid,  where 
necessary,  of  such  relatively  simple  instruments  of  precision  as  the 
thermometer,  the  watch,  the  reflex  hammer,  the  stethoscope,  the 
sphygmomanometer,  and  the  ophthalmoscope. 

It  will  be  noted  on  Forms  A 5 and  A 6 that,  in  respect  of  the 
scope  and  order  of  subject  matter,  this  examination  follows  closely 
the  clinical  history  recorded  on  Forms  A 1-4. 

The  relatively  considerable  space  and  detail  devoted  to  the 
heart  and  the  arteries  are  demanded,  both  by  the  comparative 
importance  of  these  organs  and  by  the  facility  with  which  their 
anatomy  and  physiology  may  be  investigated  by  simple  means. 
Results  of  laboratory  examinations  (routine  for  urine  and  blood 
Wassermann  tests)  are  recorded  on  Form  A 7.  In  addition  to  those 
listed,  other  tests  applicable  to  special  cases,  as  for  instance  spinal 
fluid  tests  in  syphilitics,  blood  clotting  time  and  blood  cell  counts 
for  hemophiliacs,  etc.,  are  recorded  here. 

On  account  of  the  unique  opportunity  afforded  by  the  retinal 
arteries  for  observing  the  state  of  minute  arterial  vessels,  Form  A 8, 
covering  the  examination  of  the  eye  grounds,  was  incorporated. 

Particular  attention  is  directed  to  the  fact  that  in  all  these  forms 
for  recording  the  data  of  both  clinical  history  and  physical  examina- 
tion, provision  is  made  for  indicating  answers  to  questions  in  the 
most  simple,  direct,  and  unambiguous  manner.  Wherever  it  is 
practicable  the  record  is  made  by  simply  drawing  a circle  about 
“yes”  or  “no,”  or  about  one  out  of  several  alternatives. 
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A much  more  detailed  and  elaborate  system  of  blanks,  on  a 
similar  general  plan  to  those  illustrated  above,  has  been  devised 
and  perfected  by  Dr.  Halbert  L.  Dunn,5’  6 for  use  in  making  clin- 
ical records  in  hospitals  or  in  private  practice.  His  system  is 
also  developed  thoroughly  for  mechanical  tabulation  of  the  records 
by  the  Hollerith  machine  described  later  in  this  chapter.  The 
reader  interested  in  such  medical  record  forms  should  also  consult 
the  account  of  the  successful  installation  of  such  a system  in  the 
Cincinnati  Children’s  Hospital  given  by  Hoyer  and  Mitchell.7 
It  is  interesting  to  note  that  these  latter  authors  do  not  recommend 
the  routine  installation  of  a mechanical  tabulating  system  for  all 
hospital  records,  although  their  original  record  forms  are  so  drawn 
as  to  facilitate  transference  to  punch  cards  at  any  future  time  if 
this  seems  desirable  for  particular  investigations. 

It  is  to  be  hoped  that  the  student  will  realize  that  the  idea  of 
such  blank  forms  for  medical  records  as  have  been  illustrated  here, 
or  have  been  devised  by  Dunn,  is  not  a new  one.  If  he  should 
harbor  any  such  erroneous  idea  he  may  well  read,  as  a single  cor- 
rective example,  Chapter  VII  uOn  the  Mode  of  Investigations 
and  Recording  Cases’’  in  a treatise  on  the  diseases  of  the  ovaries, 
published  in  1873,  by  the  distinguished  gynecologist  and  obstetrician, 
T.  Spencer  Wells.*  There  he  will  find  illustrated  an  excellent  set  of 
blanks,  embodying  all  of  the  essential  principles  discussed  in  this 
chapter  of  the  present  book. 

The  blanks  so  far  discussed  have  been  of  the  type  for  which  it  is 
intended  that  the  blank  shall  be  filled  out  by  an  expert  (in  the  forms 
discussed,  by  a physician)  at  the  time  of,  and  as  the  result  of  a 
personal  conference  with  the  subject.  There  will  now  be  given  an 
example  of  a different  type  of  record  form,  to  be  filled  out  by  the 
subject  himself.  In  the  nature  of  the  case  such  record  forms  must 
always  be  simpler  and  less  technical  in  character  than  those  of  the 
first  type. 

The  record  form  illustrated  in  Figs.  22-25  inclusive  has  been 
used  for  some  five  years  in  an  investigation  of  human  longevity.8 
It  has  been  circulated  chiefly  to  living  persons  ninety-five  or  more 

* Wells,  T.  Spencer:  Diseases  of  the  Ovaries:  Their  Diagnosis  and  Treatment, 
New  York  (Appleton),  1873. 
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years  of  age,  and  to  a smaller  extent  to  living  persons  between 
ninety  and  ninety-five  years  of  age.  The  blank  is  printed  on  good 


<L> 


<L> 


> 


THE  JOHNS  HOPKINS  UNIVERSITY 
INSTITUTE  FOR  BIOLOGICAL.  RESEARCH 


Investigation  of  Longevity 

By  filling  in  the  information  asked  for  on  this  form,  you  will  be  greatly  aiding  the  pro- 
gram of  our  investigation  as  to  the  factors  which  influence  longevity.  If  for  any  reason  you 
are  yourself  unable  to  write  in  the  information  desired,  will  you  not  please  get  someone  in 
your  household  to  fill  it  in  for  you.  This  form,  after  filling  out,  should  be  returned  in  the  ad- 
dressed stamped  envelope  enclosed  herewith  to  DR.  RAYMOND  PEARL,  Institute  for  Biolog- 
ical Research,  1901  East  Madison  Street,  Baltimore,  Maryland. 


NAME 

ADDRESS 

WHERE  WERE  YOU  BORN?  WHEN  WERE  YOU  BORN? 

If  bom  abroad  in  what  year  did  you  COME  TO  THIS  COUNTRY? 

How  OLD  were  you  when  you  came? 

How  many  BROTHERS  did  you  have?  How  many  SISTERS  did  you  have? 

Are  any  of  your  brothers  and  sisters  alive  now? 

If  so,  give  name  and  address. 

How  MANY  TIMES  have  you  been  MARRIED  ? What  was  your  AGE  when  MARRIED  ? 

DATE  of  MARRIAGES? 

Give  NAME  of  your  first  husband  - wife. 

How  old  was  he  - she  at  death? 

When  did  he  - she  die  (date)  ? 

Give  NAME  of  your  second  husband  - wife. 

How  old  was  he  - she  at  death? 

When  did  he  - she  die  (date)  ? 

Was  your  HUSBAND’S  - WIFE’S  FAMILY  especially  LONG-LIVED  ? 

(Give  any  particulars  that  you  know  of.) 


How  many  CHILDREN  have  you  had?  BOYS?  GIRLS? 

If  you  were  married  more  than  once  specify  how  many  CHILDREN  BY  EACH  HUSBAND  - WIFE 

How  many  of  your  CHILDREN  are  NOW  LIVING? 

How  many  GRANDCHILDREN  have  you  had? 

How  many  of  your  GRANDCHILDREN  are  NOW  LIVING? 

How  many  GREAT-GRANDCHILDREN  have  you  had? 

PLEASE  TURN  OVER 


Fig.  22. — First  page  of  longevity  record  form.  Reduced  facsimile. 


quality  bond  paper,  on  both  sides  of  two  sheets,  making  a four- 
page  leaflet,  8^  x 11  inches  in  size. 
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The  greatest  desideratum  in  a blank  to  be  filled  out  by  the  sub- 
ject himself  is  clearness , so  that  there  can  be  no  mistaking  the 


YOUR  FAMILY 

RELATIVE 

Put  in  this  column  the  AGE  OF 
THE  RELATIVE  AT  DEATH, 
OR  if  LIVING,  the  AGE  NOW. 

Put  in  this  column  the  CAUSE  OF 
DEATH  so  far  as  you  know  it.  If 
person  is  not  dead,  write  “LIVING” 

Your  FATHER 

Your  MOTHER 

Your  FATHER’S  FATHER 

Your  MOTHER’S  FATHER 

<D 

Your  FATHER’S  MOTHER 

V— 

Your  MOTHER’S  MOTHER 

Your  CHILDREN  (list  each  child 
separately  by  NAME)  here 

Put  in  this  column  the  AGE  OF 
EACH  CHILD  AT  DEATH,  or  if 
LIVING,  the  AGE  NOW. 

Put  in  this  column  the  CAUSE  OF 

THE  CHILD'S  DEATH,  so  far  as  you 
know  it.  If  child  is  not  dead,  write 
‘LIVING”. 

1st 

aj 

> 

2nd 

3rd 

O 

4th 

5th 

<d 

6th 

c 75 

7th 

8th 

9th 

10th 

O 

11th 

12th 

CD 

'i — » 

Your  BROTHERS  and  SISTERS 
(list  each  one  separately) 

NAME  here 

Put  in  this  column  the  AGE  OF 
EACH  BROTHER  AND  SISTER 
AT  DEATH,  or  if  LIVING,  the 
AGE  NOW. 

Put  in  this  column  the  CAUSE  OF 
THE  BROTHER'S  OR  SISTER’S 
DEATH.  For  those  not  dead,  write 
“LIVING”. 

4— > 

O 

O 

O 

Fig.  23. — Second  page  of  longevity  record  form.  Reduced  facsimile. 

meaning  of  the  question  to  which  an  answer  is  wanted.  Another 
point  to  be  kept  in  mind  is  to  ask  for  some  of  the  same  information, 
in  diverse  ways,  in  different  parts  of  the  blank,  in  order  to  have  a 


148 


MEDICAL  BIOMETRY  AND  STATISTICS 


check  on  the  care  and  reliability  with  which  the  blank  has  been 
filled  out. 


<L> 

C 

PERSONAL  HABITS  AND  HEALTH 

To  what  extent  have  you  USED  ALCOHOLIC  BEVERAGES  during  yoar  life? 

WINE?  BEER? 

WHISKEY  or  other  SPIRITS? 

To  what  extent  and  in  what  form  have  you  USED  TOBACCO? 

PIPE?  CIGARS? 

CIGARETTES?  CHEWING? 

SNUFF? 

cd 

4— > 

How  has  your  HEALTH  been  generally  throughout  life? 

$— 1 

OJ 

> 

Have  you  ever  had  MEASLES?  SCARLET  FEVER?  WHOOPING  COUGH? 

TYPHOID  FEVER?  MALARIA?  SMALLPOX?  PNEUMONIA? 

O 

DIPHTHERIA?  GOITER?  OTHER  SERIOUS  ILLNESS? 

<L> 

TD 

C/3 

Have  you  ever  undergone  a SURGICAL  OPERATION  ? 

If  so,  please  state  its  NATURE,  and  the  DISEASE  for  which  it  was  undertaken? 

C S) 

•r— < 

WHAT  WAS  YOUR  AGE  AT  THE  TIME? 

Please  state  any  other  DETAILS  ABOUT  YOUR  HEALTH  which  you  think  might  be  of  interest. 

c 

0 

OJ 

4— > 

4 — » 

O 

Z 

What  have  been  your  general  HABITS  during  life  as  to  EATING,  DRINKING,  SLEEPING  and  W'ORKING? 

0 

O 

TO  WHAT  DO  YOU  CHIEFLY  ATTRIBUTE  YOUR  LONG  LIFE? 

PLEASE  TURN  OVER 

Fig.  24. — Third  page  of  longevity  record  form.  Reduced  facsimile. 

Whenever  the  nature  of  the  investigation  permits  it,  there  are 
certain  definite  advantages  in  having  the  original  records  made  on 
card  forms,  rather  than  in  record  books,  or  loose  leaves  of  paper. 
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If  the  records  are  on  cards  the  work  of  subsequent  tabulation  of  the 
data  is  greatly  facilitated.  Furthermore,  the  problem  of  filing  the 
records  for  ready  reference  is  simplified. 

RESIDENCE,  OCCUPATION,  ETC. 

In  what  PLACES  have  you  RESIDED  at  different  times  in  your  life? 

Have  you  LIVED  mostly  in  the  COUNTRY  or  CITY? 

What  OCCUPATIONS  have  you  followed  at  different  times  during  life? 

To  what  extent  have  you  done  HARD  MANUAL  LABOR? 

What  is  your  RELIGIOUS  FAITH? 

To  what  RACE  STOCK  (English,  Scotch,  Irish,  German,  French,  etc.),  do  you  chiefly  belong? 

What  is  your  HEIGHT?  AVERAGE  WEIGHT? 

How  has  your  WEIGHT  CHANGED  since  you  were  25  years  of  age? 

What,  in  general,  has  been  your  BUILD  DURING  ADULT  LIFE? 

A.  THIN  AND  LEAN? 

B.  MODERATELY  THICK-SET  OR  CHUNKY? 

C.  DISTINCTLY  FAT? 

Color  of  hair  at  age  25?  Now? 

Color  of  eyes? 

Were  you  a blond  or  a brunette? 

BY  WHOM  WAS  THIS  BLANK 'FILLED  OUT? 

WHAT  IS  YOUR  RELATION  TO 

PLEASE  GIVE  ME  THE  NAME  AND  ADDRESS  OF  ANY  OTHER  RELATIVE  WHO  MIGHT  BE  ABLE 
TO  FURNISH  ADDITIONAL  OR  MISSING  INFORMATION 

DATE  WHEN  THIS  BLANK  WAS  FILLED  OUT 

PLEASE  TURN  OVER 

Fig.  25. — Fourth  page  of  longevity  record  form.  Reduced  facsimile. 

An  example  of  a card  form  for  original  records  is  shown  in  Figs. 
26  and  27.  This  is  printed  on  medium  weight  card  stock,  of  the 
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15° 

best  quality.  It  is  5 x 7 inches  in  size.  This  form  was  intended  to 
be  filled  out  by  physicians  in  obstetrical  clinics,  to  provide  informa- 
tion about  normal  fertility  and  birth  control,  as  actually  prac- 


Reduced  facsimile. 


INSTRUCTIONS. 


THE  USUAL  METHODS  OF  PREVENTING  CONCEPTION  FALL  IN  THE  FOLLOWING 
GENERAL  CLASSES:  COITUS  INTERRUPTUS  (WITHDRAWAL);  CONDOM;  PESSARY; 
ABSTINENCE  FROM  INTERCOURSE  DURING  PART  OF  MONTH;  VAGINAL  DOUCHES, 
PLAIN  OR  MEDICATED;  MEDICATED  SUPPOSITORIES.  IN  FILLING  OUT  BLANK  BE  SURE  TO  GIVE  CLEAR  DETAILS  AS 
TO  WHICH  OF  ABOVE  METHODS.  OR  OTHER  METHOD.  THE  PATIENT  HAS  PRACTISED.  UPON  THE  DEFINITENESS  AND 
PRECISION  OF  THE  INFORMATION  ON  THIS  POINT  DEPENDS  THE  SIGNIFICANCE  OF  THE  DATA  FOR  THE  RESEARCH 
PLANNED. 


REMARKS: 


Pig.  27. — Reverse  of  card  record  form  shown  in  Fig.  26. 


ticed,  which  might  serve  in  some  part  as  an  independent  check 
upon  statements  reported  from  time  to  time  by  birth  control 
organizations.* 

* The  writer  will  be  glad  to  furnish  a supply  of  these  card  forms  to  the  obstetrical 
department  of  any  hospital  willing  to  co-operate  in  the  investigation,  and  return  the 
blanks  properly  filled  in. 
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By  criticising  and  improving  the  record  forms  which  are  given 
as  examples  in  this  chapter  the  student  will  learn  more  in  a practical 
sense  of  the  principles  involved  in  the  construction  of  such  blanks 
than  can  be  imparted  by  any  amount  of  didactic  precepts. 

THE  PRESERVATION  OF  CASE  HISTORIES 

Turning  to  the  question  of  the  way  case  histories  are  handled 
after  they  are  written,  which  is  essentially  a matter  solely  of  business 
or  office  management  and  not  of  medicine  or  science,  there  are  two 
defects  in  the  common  practice.  These  relate,  first,  to  the  fixation 
of  responsibility  for  the  recording  of  each  item  in  the  history,  and, 
second,  to  the  filing  of  the  completed  histories.  From  every  point 
of  view,  whether  of  administration,  research  or  other,  it  is  of  the 
highest  importance  that  future  students  of  a hospital’s  records 
should  know  who  is  responsible  for  statements  appearing  in  a 
history.  How  often  has  one  heard  long  and  inconclusive  debates 
as  to  what  interpretation  was  to  be  put  upon  some  statement  in  a 
history  as  to  a clinical  finding?  The  decision  really  depended  upon 
who  originally  was  responsible  for  the  statement.  If  it  were  the 
considered  verdict  of  the  wise  and  experienced  old  professor,  it  was 
one  thing;  if  it  were  the  snap  judgment  of  the  latest  intern,  it  was 
quite  another.  All  this  difficulty  can  be  removed  by  inaugurating 
and  practising  the  principle  that  every  sheet  of  a history  shall  bear 
upon  its  face  the  names  of  the  person  or  persons  responsible  for 
what  appears  upon  that  page.  Perhaps  a word  of  caution  needs 
to  be  added  lest  there  should  be  some  misunderstanding.  Fixation 
of  responsibility  is  not  to  be  construed  as  an  excuse  for  any  weaken- 
ing of  the  rigid  canons  of  extreme  objectivity  in  history  or  protocol 
writing,  now  generally  taught  in  all  first-class  medical  schools. 

The  purpose  of  filing  case  histories  is  twofold : first,  to  preserve 
them,  and,  second,  to  do  it  in  such  a way  as  to  make  them  most 
readily  accessible  to  anyone  who  may  in  the  future  want  to  consult 
them.  There  can  be  no  question  that  this  latter  purpose  will  best 
be  served  by  the  so-called  “unit  system”  of  case  histories,  in  which 
the  hospital’s  complete  record  about  any  one  individual  forms  one 
separate  and  distinct  volume.  The  advantages  of  this  method  of 
preserving  histories  over  the  far  more  common  system  of  binding 
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them  up  in  great  volumes  in  numerical  or  temporal  sequence,  are 
so  obvious  as  not  to  need  detailed  exposition.  Such  a method  of 
handling  the  completed  records  is  really  essential  to  their  most 
efficient  utilization,  whether  for  statistical,  investigational,  or  any 
other  purpose. 

MECHANICAL  TABULATION 

There  are  certain  items  of  information  which  ought  to  be  and 
generally  are  intended  to  be  included  in  every  case  history.  Some 
of  these  routine  items  are: 

1.  Case  number. 

2.  Service  number. 

3.  The  patient’s  name. 

4.  Diagnosis. 

5.  Sex. 

6.  Social  status  (single,  married,  widowed,  divorced). 

7.  Age. 

8.  Occupation. 

9.  Body  weight. 

10.  Stature. 

11.  Race. 

12.  Birthplace. 

13.  Service  under  which  patient  was  treated. 

14.  Date  of  admission  to  the  hospital. 

15.  Duration  of  stay  in  hospital. 

16.  Time  from  onset  of  diagnosed  condition  to  admission  to 

hospital. 

17.  Condition  at  admission. 

18.  General  health  of  patient  prior  to  present  illness. 

19.  Whether  there  is  any  family  history  of  the  diagnosed  disease.. 

20.  Whether  a first  entry  or  a readmission. 

21.  Whether  a free,  a paying,  or  a part-paying  case. 

22.  Condition  at  discharge. 

23.  Whether  or  not  an  autopsy  was  performed. 

24.  Autopsy  number,  if  any. 

25.  Nature  of  treatment. 

26.  Complicating  pathologic  conditions,  additional  to  the  one 

diagnosed. 
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In  an  ideal  system  of  handling  such  records  each  history  should 
be  completely  cross-indexed  under  each  one  of  the  following  items 
in  the  above  list  at  least:  1 to  18  inclusive,  21,  22,  23,  24,  25.  Of 
course,  nothing  like  such  complete  cross-indexing  as  this  is  even 
attempted,  not  to  say  accomplished. 

There  is  only  one  method  now  known,  whereby  in  a practical 
way  such  an  amount  of  cross-indexing  can  possibly  be  accom- 
plished. That  method  is  to  handle  the  routine  information  by  the 
modern  system  of  mechanical  tabulating  and  indexing .*  On  this 
system  the  original  records  are  transferred,  by  means  of  a machine 
called  a “key  punch”  (cf.  Fig.  28),  to  cards,  the  record  on  the  card 
appearing  as  a series  of  punched  holes.  Then,  by  means  of  another 


Fig.  28. — Electric  key  punch  for  transferring  written  records  to  cards  to  be  used  in 

mechanical  tabulation  and  indexing. 

machine,  known  as  a “sorter”  (cf.  Fig.  29),  the  punched  cards 
can  be  mechanically  sorted,  at  a rate  of  about  350  to  400  cards  per 
minute,  into  any  desired  arrangement  relative  to  any  rubric  or 
item  of  information  recorded  upon  the  cards. 

Let  us  suppose,  for  example,  that  someone  wishes  to  assemble 
for  study  all  the  cases  of  lobar  pneumonia  which  have  been  treated 

* The  most  generally  useful  and  flexible  system  of  mechanical  tabulation  now 
available  is  that  known  as  the  Hollerith  system,  from  its  inventor,  Mr.  Herman 
Hollerith.  The  machines  of  that  system  are  the  ones  illustrated  here.  Further  in- 
formation about  these  machines  may  be  obtained  from  the  manufacturers,  The  Tabu- 
lating Machine  Company  Division  of  the  International  Business  Machines  Corpora- 
tion, 50  Broad  St.,  New  York  City.  It  may  be  of  interest  to  medical  readers  to 
know  that  a distinguished  physician,  the  late  Dr.  John  S.  Billings,  had  a great  deal 
to  do  with  the  initiation  and  early  development  of  this  invention.  He  was  a close 
friend  and  adviser  of  Mr.  Hollerith  all  through  the  early  stages. 


154 


MEDICAL  BIOMETRY  AND  STATISTICS 


in  the  hospital.  Suppose  the  diagnostic  code  number  for  lobar 
pneumonia  is  102.  One  has  then  only  to  run  the  cards  through  the 
sorter  relative  to  the  field  designated  “diagnosis”  and  pick  out, 
after  the  cards  have  been  mechanically  arranged  in  numerical  order, 
all  those  bearing  the  punched  number  102  in  the  diagnosis  field. 
These  102 ’s  will  all  be  together  in  one  bundle,  and  they  will  be  ah 
the  lobar  pneumonia  cases  in  the  hospital’s  records.  Each  card 


Fig.  29. — Horizontal  sorting  machine. 

will  bear  the  case  number,  from  which,  of  course,  the  original 
histories  can  be  consulted  if  one  desires.  If  one  particularly  wishes 
to  study  the  lobar  pneumonia  of  negroes,  he  need  only  take  his 
bundle  of  “diagnosis  102”  cards,  run  through  the  sorter  again 
relative  to  “race”  and  he  will  in  a few  moments  have  all  the  cases 
of  this  disease  in  negroes  separated  out  by  themselves.  Suppose 
he  is  further  only  interested  in  lobar  pneumonia  in  negro  children 
under  five  years  of  age,  say.  He  need  only  take  his  bundle  of 
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negro  lobar  pneumonia  cases  and  put  them  through  the  sorter 
again,  retaining  this  time  only  those  falling  into  ages  under  five. 
He  gets  his  results  at  the  rate  of  350  to  400  a minute.  Compare 
this  with  the  laborious  process  that  would  be  involved  in  assembling 


by  hand  from  an  ordinary  card  catalogue  of  hospital  case  records  the 
case  history  numbers  of  all  the  cases  of  lobar  pneumonia  in  negro 
children  under  five  ever  treated  in  the  hospital.  The  comparison  is 
as  of  hours  with  weeks  or  even  months  if  the  histories  be  numerous. 
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Again,  suppose  that  a complete  group  of  like  case  histories  has 
been  assembled  by  painfully  laborious  hand  processes,  and  one 
wishes  then  to  make  a statistical  tabulation  of  the  numerical  facts 
they  contain.  Weeks  or  months  may  easily  be,  and  often  are,  spent 
upon  the  process.  But  if  the  records  are  upon  punched  cards,  the 
pertinent  cards,  which  have  been  mechanically  assembled,  need 
only  be  run  again  through  another  machine,  known  as  a “tabulator” 
(cf.  Fig.  30),  and  the  results  relative  to  any  desired  category  of 
information  will  be  mechanically  counted  with  great  rapidity  and 
absolute  accuracy,  and  the  columns  of  figures  will  at  the  same  time 
be  added.  At  either  of  two  stages  in  the  process  the  results  may  be 
automatically  taken  off  in  printed  form,  from  the  machine,  if  it  is 
desired  to  do  so.  The  electric  accounting  machine,  shown  in  Fig. 
30,  tabulates  and  prints  the  final  results  of  the  preceding  operations. 

It  falls  outside  the  scope  of  this  book  to  go  in  detail  into  the 
theory  and  applications  of  mechanical  tabulation.  The  student 
who  wishes  to  become  familiar  with  the  scope  and  possibilities  of 
modern  mechanical  tabulating  will  do  well  to  apply  to  the  Tabu- 
lating Machine  Company  Division  of  the  International  Business 
Machines  Corporation,  50  Broad  St.,  New  York,  for  literature 
regarding  its  application  in  various  fields. 

In  the  statistical  offices  of  up-to-date  departments  of  health, 
and  in  census  offices,  the  mechanical  system  of  tabulating  the  data 
from  birth  and  death  certificates  is  employed.  The  economies  so 
effected,  both  in  time  and  money,  are  very  great.  The  student 
interested  in  this  aspect  of  the  subject  should  get  and,  study  the 
card  forms  and  codes  used  in  representative  health  departments. 

A single  example  of  a Hollerith  punch  card  form  and  its  applica- 
tion may  be  given  here.  It  is  taken  from  Dunn  and  Rockwood.6 
In  their  paper  they  illustrated  their  original  record  forms  by  filling 
them  out  for  the  hypothetical  case  of  an  imaginary  Mrs.  H.  Brown. 
Dr.  Halbert  L.  Dunn  has  kindly  given  permission  for  the  reproduc- 
tion here  of  his  discussion  of  the  punch  card  form.  Figure  31  repre- 
sents the  Hollerith  card  punched  to  represent  the  coded  information 
given  in  the  hypothetical  original  record  form. 

The  first  six  columns  indicate  the  case  number  of  the  chart.  Case  No.  1 would 
be  punched  as  000001  in  columns  1,  2,  3,  4,  5,  and  6,  respectively.  The  record  number 
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of  Mrs.  H.  Brown,  67190,  is  punched  as  067190  in  the  first  six  columns.  The  highest 
number  which  can  be  recorded  is  999999  and  allows,  therefore,  for  a tabulation  of  that 
many  separate  patients. 

Columns  from  7 to  21  are  set  aside  for  diagnoses.  Three  columns,  7,  8,  and  9, 
are  allotted  for  the  first  diagnosis  and  two  columns  each  are  allowed  for  the  remaining 
diagnoses.  The  first  column  in  each  diagnostic  field,  namely,  7,  10,  12,  14,  16,  18, 
and  20  can  be  double-punched.*  In  each  of  these  first  columns  the  blank  position 
represents  1,  the  X position  2,  and  the  zero  position  3 in  units  of  10.  It  is  possible 
by  this  means  to  indicate  forty  numbers  in  one  column.  The  number  zero  will  be 
indicated  if  a hole  is  not  punched  in  the  column,  and  numbers  from  1 to  9 if  the  printed 
numbers  from  1 to  9 are  punched.  If  the  blank  position  is  punched,  it  will  signify 
code  No.  10.  The  numbers  from  11  to  19  will  be  indicated  by  a double  punch,  namely, 
the  blank  position  for  the  1 in  units  of  10  and  the  proper  unit  number  from  1 to  9; 
No.  20  by  a single  punch  in  the  X position  which  represents  2 in  units  of  10;  Nos.  21 


Fig.  31. — An  index  card  for  the  general  medical  examination,  which  has  been  punched 
for  the  illustrative  diabetic  record  of  Mrs.  H.  Brown. 


to  29  by  a double  punch,  one  of  which  is  the  X position  representing  2 in  units  of  10 
and  the  other  the  proper  unit  number  from  1 to  9;  No.  30  by  a single  punch  in  the  zero 
position  which  stands  for  3 in  units  of  10  and  numbers  from  31  to  39  by  a double 
punch,  one  of  which  is  the  zero  position  representing  3 in  units  of  10  and  the  other 
the  proper  unit  number  from  1 to  9. 

The  second  column  in  each  diagnostic  field  is  punched  in  one  position  only. 
Eleven  positions  may  be  indicated  in  this  column  which  are,  respectively,  the  blank 
position,  0,  1,  2,  3,  4,  5,  6,  7,  8,  and  9.  It  is  possible,  therefore,  to  code  into  hundreds 
in  each  two-column  field  for  any  given  diagnosis.  The  numbers  would  read  serially 
as  00-blank,  000,  001,  002,  003,  004,  005,  006,  007,  008,  009,  01-blank,  010,  011,  012, 
etc.,  up  to  the  highest  number  which  would  be  399.  Counting  the  blanks  and  zeros 

* In  order  to  accord  with  the  wiring  possibilities  of  the  printing  tabulator  only 
twenty-five  columns  can  be  double-punched.  We  have  taken  advantage  of  every  one 
of  these  possibilities  in  the  proposed  cross-index  card. 
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as  separate  numbers,  this  code  permits  the  division  of  each  diagnostic  field  into  forty 
major  headings  in  the  first  column  with  a subdivision  of  each  of  these  into  eleven  sub- 
sidiary units  in  the  second  column.  The  total  number  of  items  which  it  is  possible 
to  list  in  the  two-column  field  by  this  process  is  440. 

A survey  made  in  several  hospitals  shows  that  usually  in  about  80  per  cent,  of  the 
case  records  there  is  only  one  diagnosis.  Multiple  diagnoses  up  to  four  are  fairly 
common,  and  over  seven  extremely  unusual.  The  first  diagnosis,  columns  7,  8,  and 
9 has  a third  column,  No.  9,  which  permits  the  subdivision  of  each  of  the  440  items  in 
the  first  two  columns  into  eleven  subsidiary  units  allowing  for  4840  items  in  the  code 
of  the  first  primary  diagnosis. 

If  there  should  be  multiple  diagnoses  of  eight  or  more  numbers  for  any  given 
hospital  record,  the  code  number  of  these  diagnoses  should  be  written  in  ink  on  the 
back  of  the  card.  Any  card  which  has  been  punched  for  seven  diagnoses  must  be 
examined  for  written  code  numbers  on  the  back.  It  is  estimated  that  this  event  should 
not  occur  more  than  once  in  two  or  three  hundred  times. 

In  the  record  of  Mrs.  H.  Brown,  illustrated  in  Fig.  31,  there  are  seven  diagnoses. 
These  diagnoses  with  their  respective  code  numbers  are  as  follows:  (1)  Diabetes  melli- 
tus  (6  blank);  (2)  obesity  (60);  (3)  diabetic  acidosis  (63);  (4)  general  arteriosclerosis 
(80);  (5)  hypertension  (81);  (6)  chronic  constipation  (124),  and  (7)  hemorrhoids  (126). 

The  diagnostic  code  has  not  been  filled  out  to  thousands  and,  consequently,  the 
third  column  in  the  first  diagnosis  is  left  unpunched.  The  code  number  of  diagnosis  1 
(6  blank)  is  punched  in  columns  7 and  8;  that  of  diagnosis  2 (60)  in  columns  10  and  11; 
of  3 (63)  in  columns  12  and  13;  of  4 (80)  in  columns  14  and  15;  of  5 (81)  in  columns 
16  and  17;  of  6 (124)  in  columns  18  and  19,  and  of  7 (126)  in  columns  20  and  21. 

The  next  four  columns  of  the  punch-card  from  22  to  25  represent  the  items  of 
age,  sex,  color,  outcome  and  civil  state.  Age  is  coded  in  columns  22  and  23;  sex  (male 
or  female)  and  color  (white  or  black)  in  column  24;  outcome  (well,  improved,  same, 
worse  or  dead)  and  civil  state  (single,  married,  widowed,  divorced  or  separated)  in 
column  25.  For  example,  the  age  of  Mrs.  H.  Brown  is  indicated  as  49  in  columns  22 
and  23,  the  sex  (female)  and  color  (white)  are  coded  as  No.  2 in  column  24,  and  civil 
state  married  and  outcome  improved  by  code  No.  7 in  column  25. 

The  main  divisions  of  the  history  occupy  four  columns  from  26  to  29;  each  is 
double-punched,  using  the  blank  key  as  1,  the  X key  as  2 and  the  zero  key  as  3 in 
units  of  ten.  By  this  means,  numbers  up  to  forty  can  be  indicated  in  the  same  manner 
as  described  for  the  first  column  of  each  diagnostic  field.  Column  26  represents  past 
illnesses  including  abnormalities  in  the  weight  curve;  27,  the  respiratory,  circulatory, 
and  gastro-intestinal  systems;  28,  the  genito-urinary  and  nervous  system;  and  column 
29,  the  routine  history  and  the  clinician’s  opinion  of  the  accuracy  of  the  history. 

Only  thirty-one  numbers  of  the  possible  forty  are  used  in  each  one  of  these  col- 
umns. Each  of  the  thirty-one  numbers  represents  an  abnormality  or  combination  of 
abnormalities.  It  is  possible  to  list  five  separate  items  or  any  combination  of  these 
five  items  by  use  of  a combination  code  printed  on  the  history.  The  past  illness, 
for  instance,  is  divided  into  five  subdivisions.  If  any  one  of  the  diseases  from  typhoid 
to  scarlet  fever  has  been  checked  in  the  past  illness,  No.  1 is  marked  as  positive.  If 
the  patient  has  had  syphilis,  gonorrhea  or  a history  of  abnormal  weight  curve,*  No.  2 

* The  code  position  of  the  weight  curve  is  placed  with  past  illnesses,  while  its  chart 
position  naturally  falls  after  the  routine  question  by  systems.  It  represents  the  only 
divergence  in  the  arrangement  by  order  between  the  index  code  and  the  printed  chart. 
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is  checked;  if  he  has  had  a rrajor  operation  or  accident,  No.  3;  a nose  or  throat  opera- 
tion, No.  4;  and  if  there  is  some  important  item  in  the  miscellaneous,  not  included  in 
the  routine  list  of  past  illnesses,  No.  5 is  checked. 

If  the  patient  has  only  one  of  these  five  conditions,  for  instance  No.  2,  the  coded 
number  would  be  identical  to  the  number  checked  on  the  chart.  If,  however,  he  had 
items  1,  2,  and  4 checked,  this  combination  would  be  indicated  by  code  No.  17  in 
column  26  of  the  index  card. 

In  Fig.  31  code  No.  26  is  punched  in  column  26  representing  positive  observa- 
tions in  the  past  illness  in  items  1,  2,  3,  and  4.  Likewise,  code  No.  17  in  column  27 
represents  positive  observations  in  items  1,  2,  and  4. 

Body  height,  grouped  by  classes,  is  given  in  column  30,  and  body  weight,  also  by 
classes,  in  column  31.  Mrs.  H.  Brown  has  a stature  of  64  inches  (162.56  cm.)  repre- 
sented by  code  No.  7 in  column  30,  and  a body  weight  of  160  pounds  (74.4  kg.)  indi- 
cated by  code  No.  7 in  column  31. 

The  physical  examination  occupies  eight  columns,  from  32  to  39,  each  one  of 
which  is  double-punched,  so  that  it  can  represent  numbers  up  to  40.  All  of  the  code 
numbers  in  the  physical  examination  stand  for  abnormalities.  In  column  32,  ab- 
normalities of  the  head  and  face  are  noted;  33,  of  the  mouth  and  throat;  in  34,  of  the 
neck,  spine  and  thorax;  in  35,  of  the  chest  and  lungs;  in  36,  of  the  heart;  in  37,  of  the 
vessels  and  abdomen;  in  38,  of  the  extremities  and  neurologic  symptoms  and  39,  of 
the  lymph  nodes,  skin,  genitalia,  rectum,  and  abnormal  psyche. 

Abnormal  laboratory  observations  are  indicated  in  columns  40  to  43. 

Columns  44  and  45  have  been  set  aside  for  miscellaneous  conditions  and  may  be 
assigned  in  any  way  desired  by  a specific  institution.  We  suggest  a certain  arrange- 
ment which  may  or  may  not  be  followed.  In  this  arrangement,  four  conditions  are 
coded  in  column  44  which  might  exist  in  any  diagnosis  or  any  case,  namely,  autopsy, 
major  operations,  minor  operations,  and  previous  admissions.  This  leaves  a fifth 
blank  space  for  some  special  interest  still  unassigned.  Column  45  could  be  reserved 
for  the  indication  of  the  principal  service  on  which  the  patient  was  treated.  Many 
hospitals  would  not  desire  to  make  such  a distinction  between  their  records. 

The  punch-card  form  illustrated  in  Fig.  31  is  on  the  old  standard 
45 -column  card.  There  is  now  available  for  the  tabulating  machine 
equipment  illustrated  an  80-column  card  of  the  same  dimensions 
as  the  45-column  card.  The  obvious  advantage  of  this  is  that  a 
much  greater  amount  of  information  can  be  put  upon  a single 
card. 

Out  of  some  twenty  years’  experience  with  mechanical  tabulation 
in  various  fields  the  writer  may  perhaps  be  permitted  to  state 
briefly  his  considered  evaluation  of  it  for  scientific  research  pur- 
poses. Wherever  the  problem  to  be  dealt  with  is  one  either  of  ( a ) 
cross-indexing  a mass  of  data  so  that  all  original  records  falling  in 
a particular  category  of  manifold  characteristics  may  be  quickly 
picked  out  from  a file  containing  a large  mass  of  diverse,  separate 
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records,  or  (b)  tabulating  a mass  of  purely  and  inherently  numerical 
observations , mechanical  tabulation  is  without  any  rival.  It  can 
do  these  two  jobs  more  quickly,  more  accurately,  and  in  every 
way  better  than  any  other  known  mode  of  procedure.  If  the 
investigator  has  problems  involving  either  of  these  types  of  opera- 
tion he  would  be  as  foolish  not  to  employ  punched  cards  and 
mechanical  tabulation  as  he  would  be  if  he  insisted  on  making  his 
journeys  about  the  country  in  a stage  coach  in  preference  to  the 
railroad  train  or  the  automobile. 

When,  on  the  other  hand,  an  investigator  has  to  deal  with  data 
which  are  inherently  not  numerical  in  character,  but  have  to  be  made 
so  artificially  by  some  process  of  coding,  the  case  is  not  quite  the 
same.  The  punch  card  then  automatically  destroys  all  qualifica- 
tions, shadings  or  half-tones  in  the  original  written  records.  The 
system  is  inherently  rigid.  Long  observation  indicates  that  while 
the  young  investigator  does  not  much  mind  this  feature,  the  older, 
more  critical,  and  perhaps  wiser  investigator  prefers  to  make  his 
tabulations  of  inherently  qualitative,  “judgment/’  data  directly 
from  the  original  record,  rather  than  from  a punch  card,  which 
can  only  tell  him,  in  each  particular  case,  the  rigid  category  into 
which  he,  or  somebody  else,  at  some  time  decided  that  a particular 
observation  was  to  be  put.  The  real  point  seems  to  be  that  the 
investigator’s  point  of  view  frequently — and  rightly — changes  dur- 
ing the  course  of  a long  investigation.  As  he  penetrates  deeper  into 
a mass  of  observational  material  he  sees  meaning  and  relationships 
in  it,  of  which  he  had  no  conception  at  the  start.  These  things  alter 
his  views  as  to  the  significance  and  disposition  of  particular  indi- 
vidual observations.  But  transferral  of  the  records  via  the  code 
route  to  punch  cards,  if  the  system  is  to  display  its  potential  smooth 
and  accurate  efficiency,  must  take  place  at  the  beginning  and  not 
at  the  end  of  the  investigation.  Experience  shows  that  one  becomes 
more  and  more  cautious  about  choosing  this  route,  and  more  and 
more  inclined  to  adopt  other  devices,  which,  while  retaining  the 
possibility  of  immediate  reference  to  the  original  written  record 
about  qualitative  observations  in  each  individual  instance,  adopt 
some  of  the  features  of  the  punch  card  system  which  facilitate 
rapid  and  accurate  tabulation. 
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A simple  form  designed  for  precisely  this  purpose  is  shown  in 
Fig.  32. 

This  card  form  was  devised  for  investigations  on  autopsy 
records.10  In  each  of  the  spaces  following  an  organ  designation  the 
essential  statements  about  the  lesions  of  that  organ,  as  given  in 
the  original  protocol  by  the  pathologist,  are  copied.  By  the  use  of 


Fig.  32. — Reduced  facsimile  of  card  form  for  autopsy  records?. 


fine  handwriting  a large  amount  of  information  can  be  put  on  the 
face  of  the  card.  Tf  still  further  detail  is  wanted,  the  back  of  the 
card  is  available. 

The  card  is  8^  x 11  inches  in  size.  The  upper  left-hand  corner  is 
clipped  off  to  facilitate  stacking.  The  cards  are  printed  on  heavy 
card  stock  of  four  different  colors,  to  make  easy  the  distinction 
of  sex  and  color  of  the  patients  in  tabulating.  Records  for  white 
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male  patients  are  put  on  white  cards;  for  white  females  on  pink 
cards;  for  colored  males  on  green  cards,  and  for  colored  females  on 
yellow  cards. 

The  purpose  of  the  numbered  cells  around  the  edge  of  the  card 
is  to  furnish  the  guides  for  the  accurate  punching  of  holes  with  a 
hand  punch,  in  order  to  facilitate  the  sorting  out  of  groups  of  like 
cards.  For  example,  every  card  which  records  a case  showing  any 
malignant  neoplasm  has  a hole  punched  where  the  circle  is  printed 
in  cell  No.  50  at  the  bottom  of  the  card.  This  makes  it  possible  to 
assemble  at  any  time,  quickly  and  accurately,  all  the  cases  of 
malignancy  in  the  material. 
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CHAPTER  VI 


GRAPHIC  REPRESENTATION  OF  STATISTICAL  DATA 

VALUE  OF  STATISTICAL  DIAGRAMS 

Diagrams  properly  constructed  and  intelligently  used  con- 
stitute one  of  the  most  potent  tools  in  the  statistician’s  armamen- 
tarium. Even  the  most  seductively  constructed  and  arranged 
table  of  statistics  will  not  convey  the  story  which  inheres  in  the 
figures  with  anything  like  the  neatness  and  despatch  attainable  by 
graphic  presentation. 

The  graphic  side  of  statistical  work  has  received  a great  deal  of 
attention  in  recent  years  and  there  are  several  excellent  treatises 
available,  dealing  solely  with  this  subject  (see  reading  list  at  the 
end  of  this  chapter).  Any  detailed  treatment  of  the  subject  is 
impossible  in  the  space  available  here.  The  attempt  will  be  only  to 
set  forth  a few  of  the  most  elementary  principles,  and  to  introduce 
the  reader  to  the  more  detailed  literature. 

GENERAL  CHARACTERISTICS 

Before  developing  the  structure  and  uses  of  different  types  of 
statistical  diagrams  it  is  desirable  to  say  a word  about  their  under- 
lying general  characteristics. 

All  statistical  diagrams  are  representations  of  points , lines,  sur- 
faces or  solids , the  positions  of  which  in  space  are  quantitatively  de- 
fined by  a system  of  co-ordinates. 

These  co-ordinates  may  be  of  various  sorts.  The  most  common 
sort  are  rectangular  co-ordinates.  Here  a point  p in  a plane 
(Fig.  33)  has  its  position  defined  (as  indicated  by  the  dotted  lines) 
in  terms  of  the  x and  y axes  of  reference. 

The  distance  from  o to  the  dotted  line  on  the  horizontal  axis  is 
known  conventionally  as  the  abscissa  of  the  point  p.  The  distance 
on  the  vertical  axis  from  o to  the  dotted  line  is  known  as  the  ordinate 
of  the  point  p.  The  horizontal  or  x axis  is  the  abscissa l axis.  The 
vertical  or  y axis  is  the  axis  of  ordinates , or  the  ordinal  axis.  Gen- 
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erally  and  usually  in  plotting  statistical  data  to  rectangular  axes 
the  classes  of  things  are  laid  off  as  abscissas,  and  the  frequencies  of 
these  classes  as  ordinates.  This,  however,  is  only  a convention, 
and  not  a law  of  nature. 

Besides  rectangular  co-ordinates,  there  are  sometimes  used  in 
statistical  diagrams: 

{a)  Angular  co-ordinates  (as  in  “pie”  diagrams). 

(b)  Polar  co-ordinates. 


Fig.  33. — Diagram  to  illustrate  rectangular  co-ordinates.  0 is  the  origin.  The 
arrows  indicate  the  conventional  directions  relative  to  algebraic  signs. 

(c)  “Geographical”  co-ordinates  (as  in  a statistical  map,  where 
latitude  and  longitude  are  the  axes  of  reference,  really 
angular  co-ordinates  which  may  become  rectangular  by 
projection  to  a plane). 

TYPES  OF  DIAGRAMS 

The  first  question  which  anyone  should  ask  himself  who  feels 
an  impulse  to  make  a statistical  diagram  is  this:  What  is  to  be 
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the  fundamental  purpose  of  this  diagram?  What  is  the  essential 
point  that  it  is  intended  to  convey  to  the  viewer?  The  answer  to 
this  question  virtually  settles  the  type  of  diagram  to  be  employed, 
because  there  is  a rather  definite  adaptation  of  diagram  types. 
Some  types  of  diagrams  are  much  better  fitted  than  others  to  the 
telling  of  particular  kinds  of  statistical  stories. 

Consider  the  following  scheme: 

A.  Purpose:  To  represent  frequencies  of  things  or  events. 

1.  Categories  or  attributes  of  qualitative  things,  which  do  not  vary  continu- 

ously in  the  mathematical  sense. 

Type  of  diagram:  ( a ) Bar  diagram  (cf.  Figs.  34  and  35). 

(b)  “Pie”  diagram  (cf.  Fig.  36). 

(c)  Frequency  polygon  (Figs.  40,  41).* 

2.  Things  which  vary  continuously. 

Type  of  diagram:  ( a ) Flistogram  (cf.  Figs.  37-39). 

( b ) Frequency  polygon  (cf.  Figs.  40,  41). 

(c)  Ogive  curve  (cf.  Fig.  42). 

(d)  Integral  curve  (cf.  Figs.  43,  44). 

B.  Purpose:  To  represent  trends  of  events  or  things. 

1.  In  Time.  Non-cyclic. 

Type  of  diagram:  ( a ) Line  diagram  on  arithlog  grid  (cf.  Figs.  47,  48). 

(b)  Line  diagram  on  arithmetic  grid  (cf.  Figs.  45-47). f 

2.  In  Time.  Cyclic. 

Type  of  diagram:  (a)  Line  diagram  on  arithmetic  grid  (cf.  Fig.  49). 

( b ) Polar  co-ordinates  (cf.  Fig.  50). 

C.  Purpose:  To  show  distribution  of  things  or  events. 

Type  of  diagram:  (a)  Spot  map  (cf.  Fig.  51). 

(5)  Shaded  map  (cf.  Fig.  52). 

(c)  Scatter  diagram  (cf.  Fig.  53). 

D.  Purpose:  To  facilitate  or  replace  computation. 

Type  of  diagram:  (a)  Nomogram  (cf.  Figs.  54-56). 

Bar  Diagrams 

The  bar  diagram  is  the  simplest  possible  picture  of  a statistical 
situation.  Figure  34  is  a bar  diagramj  showing  the  proportion 

* Strictly  speaking,  the  frequency  polygon  belongs  here  rather  than  under  2b, 
where  it  is  also  listed.  The  rigid  statistical  purist  will  use  the  frequency  polygon  only 
to  depict  discontinuous  variation.  But  in  actual  statistical  practice  it  always  has 
been,  and  probably  will  be,  usefully  employed  as  a substitute  or  alternate  for  the 
histogram,  especially  where  it  is  desired  to  compare  graphically  several  frequency  dis- 
tributions in  the  same  diagram. 

t The  logistic  curve  (cf.  Chapter  XVII)  is  a special  case  of  this  type  of  diagram. 
X From  R.  Pearl,  The  Nation’s  Food,  Philadelphia,  1920,  p.  237.  Data  on  which 
diagram  is  based  are  there  given. 
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which  each  of  the  more  important  foods  contributes  to  the  total 
protein  consumed  in  the  United  States  by  human  beings. 

PER6EPTA6E  COPTR/BUT/OP  TO  TOTAL  PROTE/P  COPSUMED 

PER  CETYT 

I 5 JO /5_ 20 25 30 35 40 2-5 

WHEAT 

PAfRY  PRODUCTS 


POTATOES 
E/SH 

LE6UMES 

\ 

PUTS 
PUTTOP 
OTHER  CEREALS 
OTHER  VEGETABLES 

net  g, 

RYE 
COCOA 
APPLES 

5 OTHER  ROODS 
COMBJPED 

Fig.  34. — Diagram  showing  the  percentage  of  the  total  protein  consumed  in 
the  United  States  contributed  by  each  of  23  commodities.  The  solid  bars  denote 
the  average  consumption  in  the  six  years  preceding  our  entry  into  the  war.  The 
cross  hatched  bars  denote  the  consumption  in  1917  and  1918. 


From  this  diagram  one  sees  at  a glance  the  relative  significance 
of  the  great  staple  foods  in  furnishing  protein  for  human  consump- 
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tion.  Wheat  stands  first.  Beef  contributes  roughly  one-half  as  much 
protein  to  the  national  dietary  as  wheat,  and  poultry  and  eggs  about 
half  as  much  as  beef,  etc.  The  whole  story  of  the  sources  of  the 
protein  we,  as  a people,  consume  is  accurately  visualized. 

Percentage 

0 to  20  30  40  50  60  70  60  90  IOO 

Jaundice 

Bile  - s tamed 
urine 

Nausea 

C/ay-  colored 
stools 

Anorexia 
Fever 

Constipation 
Headache 
Vomiting 

Prostrat/on 

Abdominal 
pom 

Chills 

Limb  pains 

Conjunctival 
congest 
Unusual  prev 
of  rats  on  prem 

Diarrhea 
Hiccup 
Dp  is  taxis 
Herpes 

100  90  80  FO  60  50  40  30  20  10  0 

I i 

Symptom  Symptom 

present  absent 

Fig.  35. — Bar  diagram  based  upon  data  of  Table  8,  Chapter  IV,  showing  the  relative 
frequency  of  different  symptoms  in  epidemic  jaundice. 

The  percentage  columns  of  Table  8 in  Chapter  IV  make  the  bar 
diagram  shown  in  Fig.  35.  This  is  a slightly  different  form  of  bar 
diagram  from  that  shown  in  Fig.  34. 
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Bar  diagrams  find  perhaps  their  most  appropriate  field  of  use- 
fulness in  the  graphic  representation  of  discontinuous  variates, 
as  is  illustrated  in  the  two  examples  here  given.  Wheat  and  dairy 
products  are  discontinuous,  discrete  entities;  one  cannot  start  from 
wheat  and  by  a series  of  minute  continuous  steps  or  gradations 
pass  to  dairy  products.  Similarly,  jaundice  and  nausea  are  phys- 
ically discontinuous  phenomena.  Hence  it  is  appropriate  to 
represent  them  graphically  by  physically  separate  bars.  The  case 
is  quite  different  with  continuous  variates.  It  is  possible  to  pass 
continuously  by  successive,  unbroken  small  steps  from  a height  of 
60  inches  say  to  a height  of  65  inches.  Hence  it  is  proper  to  rep- 
resent such  phenomena  graphically  by  continuous  lines.  One 
frequently  sees  bar  diagrams  in  which  each  bar  represents  a phys- 
ically discrete  phenomenon  or  entity,  but  in  the  diagram  the  ends 
of  the  bars  have  been  connected  by  a line.  This  is  bad  practice. 
Its  absurdity  is  evident  if  one  tries  to  read  a point  on  the  line  in 
terms  of  abscissal  or  ordinal  units.  What  is  the  meaning  of  some- 
thing half-way  between  wheat  and  dairy  products? 

“Pie”  Diagrams 

For  a reason  which  will  be  perfectly  obvious  to  all  American 
readers,  and  which  foreign  readers  have  no  occasion  to  be  interested 
in,  sector  diagrams  plotted  to  angular  co-ordinates  are  called 
colloquially  “pie”  diagrams.  An  example  of  such  a diagram  is 
seen  in  Fig.  36. 

While  this  form  of  diagram  is  extremely  popular,  especially  in 
exhibit  work,  I agree  entirely  with  Brin  ton  that  it  is  a far  less 
desirable  type  than  the  simple  bar  diagram.  Its  use  should  proba- 
bly be  confined  strictly  to  popular  presentation,  as  in  exhibit  and 
propaganda  work. 

Histograms,  Frequency  Polygons,  Ogives. 

It  will  be  desirable  to  consider  this  group  of  graphic  forms 
together,  and  because  of  their  importance  and  frequent  use  the 
methods  of  their  construction  from  the  original  data  will  be  treated 
in  detail.  As  material  for  this  study  of  graphic  representation  the 
data  of  Table  10  may  be  used.  This  table  gives  the  head  heights 
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in  millimeters  of  68  male  inmates  of  the  Haddington  District 
Asylum  in  Scotland,  as  reported  by  Tocher*  (p.  39). 

The  data  of  Table  10  (p.  171)  are  simply  a list  of  observations 
just  as  originally  presented  by  Tocher.  To  make  them  into  usable 
statistics  they  must  first  be  converted  into  a frequency  distribution 
in  which  like  head  heights  will  be  brought  together.  This  is  done 
in  Table  11  (p.  172). 


USE  OF  THE  LAND 
PRESENT  AND 
POTENTIAL 


Fig.  36. — Example  of  diagram  to  angular  co-ordinates.  (Reproduced  by  per- 
mission of  Dr.  O.  E.  Baker  and  the  editor  of  the  Geographical  Review  from  an  article 
by  Dr.  Baker  entitled  “Land  Utilization  in  the  United  States:  Geographical  Aspects 
of  the  Problem,”  published  in  the  Geographical  Review,  vol.  13,  January,  1923.) 

It  is  evident  that  the  extent  of  variation  is  so  great  in  this 
character  height  of  head  that  a class  unit  of  1 mm.  is  too  fine.  It  is 
necessary  to  group  the  material  into  larger  class  units.  This  is 
done  in  the  third  column  of  the  table,  headed  “Frequencies  grouped 
in  5 mm.  classes.”  The  class  limits  are  taken  to  begin  on  the  even 
5 and  10  mm.  points. 

* Tocher,  J.  F.:  Anthropometric  Survey  of  the  Inmates  of  Asylums  in  Scotland, 
Henderson  Trust  Reports,  vol.  i,  Edinburgh,  1905. 
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“Histogram”  is  the  name  given  by  Pearson  to  the  correct 
graphical  representation  of  frequency  distributions.  In  these 
diagrams  the  class  limits  are  laid  off  on  the  abscissal  axis,  and  the 
frequencies  over  each  abscissal  element  are  given  as  the  areas  of 


TABLE  10 

Tocher’s  Data  on  Head  Height  of  Male  Inmates  of  Haddington  District 

Asylum 


Patient 

No. 

Head  height, 
mm. 

Patient 

No. 

Head  height, 
mm. 

1 

137 

35 

142 

2 

144 

36 

139 

3 

132 

37 

138 

4 

131 

38 

129 

5 

131 

39 

139 

6 

144 

40 

137 

7 

145 

41 

139 

8 

155 

42 

126 

9 

125 

43 

145 

10 

146 

44 

143 

11 

143 

45 

133 

12 

152 

46 

137 

13 

137 

47 

143 

14 

134 

48 

125 

15 

140 

49 

139 

16 

137 

50 

131 

17 

142 

51 

119 

18 

138 

52 

134 

19. 

150 

53 

143 

20 

141 

54 

149 

21 

129 

55 

136 

22 

137 

56 

150 

23 

129 

57 

141 

24 

140 

58 

131 

25 

130 

59 

143 

26 

143 

60 

129 

27 

141 

61 

131 

28 

126 

62 

145 

29 

134 

63 

133 

30 

138 

64 

134 

31 

139 

65 

125 

32 

144 

66 

138 

33 

128 

67 

130 

34 

J38 

68 

134 

rectangles  erected  on  these  base  elements.  So  long  as  the  base 
elements  (that  is,  sizes  of  the  classes  into  which  the  material  is 
grouped)  are  all  equal,  then  obviously  the  heights  of  the  rectangles 
will  be  proportionate  to  the  frequency. 
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Suppose  now  we  plot  as  a histogram  the  data  of  the  first  (un- 
grouped) half  of  Table  11.  The  result  will  be  that  shown  in  Fig.  37. 
Now  it  is  at  once  evident  that  Fig.  37  is  an  inadequate  and 

TABLE  11 


Frequency  Distribution  of  Head  Heights  from  Table  10 


Head  heights, 
mm. 

Ungrouped 

frequencies. 

Frequencies  grouped 
in  5 mm.  classes. 

Class  limits  for 
group  frequencies, 
mm. 

119 

1 

< 

1 

115-119 

120 

121 

122 

► 

120-124 

123 

124 

125 

3 

126 

2 

127 

10 

125-129 

128 

1 

129 

4 

130 

2 

131 

5 

132 

1 

15 

130-134 

133 

2 

134 

5 

135 

s 

136 

1 

137 

6 

j>  17 

135-139 

138 

5 

139 

5 

140 

2 

< 

141 

3 

142 

2 

!•  16 

140-144 

143 

6 

144 

3 

145 

3 

< 

146 

1 

147 

5 

145-149 

148 

149 

1 

150 

2 

s 

151 

152 

1 

3 

150-154 

153 

154 

155 

1 

1 

1 

155-159 

Totals 

68 

68 

— 

misleading  graphical  representation  of  the  important  facts  about 
variation  in  head  height  in  this  group  of  people.  It  is  a long,  flat 
thing  with  many  gaps  and  only  roughly  indicates  what  general 
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sorts  of  head  heights  occur  most  frequently.  The  grouping,  in 

<9 


L 

S’ 

^2 


0 


1 
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i 1 r 


r 


NEAD  hEJ&iT  mm. 


Fig.  37. — Histogram  of  ungrouped  frequencies  of  head  height  from  Table  11. 


US  IZO  IZ5  130  135  140  145  150  155  160 

HEAD  HEIGHT  mm. 

Fig.  38. — Histogram  of  grouped  frequencies  of  head  height  from  Table  11. 

short,  is  too  fine  for  so  small  a sample  as  68.  A much  clearer  and 
more  adequate  idea  of  the  real  state  of  the  case  is  given  in  Fig.  38, 
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which  is  a histogram  plotted  from  the  grouped  data  of  the  latter 
half  of  Table  11. 

From  this  diagram  an  adequate  picture  is  obtained  of  the  real 
distribution  of  head  heights  in  this  group.  The  skewness  of  the 
distribution  is  apparent.  Another  example  of  a histogram  is  seen 
in  Fig.  78  injra.  A method  of  drawing  a histogram  which  is 
preferred  by  some  statisticians  is  that  shown  in  Fig.  39.  It  will 
be  seen  to  consist  simply  in  the  omission  of  that  part  of  the 
vertical  grid  work  of  the  drawing  which  lies  below  the  top  of  the 


HEAD  HEIGHT  mm. 

Fig.  39. — Alternative  form  of  histogram  shown  in  Fig.  38. 


lower  of  each  pair  of  adjacent  rectangles.  It  is  an  attempt  to 
realize  the  advantages,  for  comparative  purposes,  of  the  frequency 
polygon  without  at  the  same  time  sacrificing  the  complete  math- 
ematical accuracy  of  the  histogram. 

While  the  histogram  is,  on  theoretic  grounds,  the  most  accurate 
method  of  graphically  representing  frequency  distributions,  it  is 
sometimes  more  practically  useful  to  represent  them  as  frequency 
polygons. 

A frequency  polygon  is  the  result  that  one  gets  by  assuming 
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that  the  total  frequency  in  any  given  class  is  concentrated  at  the 
center  of  that  class,  and  then  plotting  ordinates  of  height  propor- 
tionate to  the  frequencies  supposed  concentrated  at  those  midpoints. 
The  histogram  of  Fig.  38  is  shown  plotted  as  a frequency  polygon 
in  Fig.  40. 

The  frequency  polygon  is  less  accurate  than  the  histogram 
because  it  does  not  truly  represent  the  frequency  areas  over  the 
base  elements.  But  it  is  an  extremely  useful  form  of  frequency 
diagram  for  comparative  purposes.  It  may  be  employed  freely 


HO  1 15  110  125  130  135  ' 140  145  150  155  160  165 

MEAD  HEIGHT  mm. 

Fig.  40. — Frequency  polygon  of  grouped  frequencies  of  head  heights  from  Table  11. 

in  place  of  the  histogram  where  the  only  object  is  to  give  a general 
picture  to  the  eye  of  a series  of  overlapping  frequency  distributions. 
An  example  of  such  comparative  use  is  shown  in  Fig.  41. 

Another  method  of  representing  frequency  distributions  graph- 
ically was  devised  by  Galton,  and  the  resulting  type  of  curve  was 
called  by  him  the  “ogive.”  It  is  the  sort  of  curve  which  would 
be  got  if  1000  men  taken  at  random  were  arranged  in  a row  in 
order  of  their  heights,  beginning  with  the  shortest  at  one  end, 
and  ending  with  the  tallest  at  the  other.  If  now  a smooth  line  be 
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imagined  just  touching  the  top  of  the  head  of  each  man  in  the 
row,  this  line  would  be  an  ogive  curve,  in  Gal  ton’s  sense.  The 
data  of  Table  11  are  plotted  as  an  ogive  curve  in  Fig.  42. 

It  is  seen  that  in  this  curve  the  head  heights  in  millimeters 
are  now  taken  as  ordinates,  and  at  equal  intervals  along  the  abscis- 
sal  axis  there  is  erected  an  ordinate  for  each  of  the  68  individuals. 


Fig.  41. — Frequency  polygons  showing  the  age  distribution  of  dead  mothers  of 
dead  ( a ) tuberculous  (solid  line)  and  ( b ) non-tuberculous  (broken  line)  individuals. 
(Reproduced  from  Pearl,  R.,  “The  Age  at  Death  of  the  Parents  of  the  Tuberculous 
and  the  Cancerous,”  Amer.  Jour.  Hygiene,  vol.  3,  pp.  71-89,  1923.) 

If  a larger  number  of  individuals  were  involved  the  curve  would 
be  smoother.  The  curve  is  seen  to  be  like  the  mirror  image  of  an 
enormously  stretched  out  and  elongated  S,  or  an  integral  sign,  lying 
on  its  back. 

Integral  or  Cumulated  Frequency  Diagrams 

So  far  in  the  discussion  of  the  graphic  representation  of  fre- 
quencies, we  have  plotted  the  value  of  each  single  frequency,  by 


GRAPHIC  REPRESENTATION  OF  STATISTICAL  DATA  1 77 


itself,  against  its  proper  abscissa.  Let  us  consider  now  the  integral 
or  accumulated  diagram  of  frequency.  In  this  case  the  frequency 
is  successively  accumulated , class  by  class,  from  the  lower  range 

TABLE  12 

Cumulated  Frequency  Distributions,  Absolute  and  Percentage,  of  the  Head 

Heights  from  Table  11 


Head  height,  mm. 

Cumulated  frequencies. 

Observed. 

Percentage. 

119 

1 

1.5 

120 

1 

1.5 

121 

1 

1.5 

122 

1 

1.5 

123 

1 

1.5 

124 

1 

1.5 

125 

4 

5.9 

126 

6 

8.8 

127 

6 

8.8 

128 

7 

10.3 

129 

11 

16.2 

130 

13 

19.1 

131 

18 

26.5 

132 

19 

27.9 

133 

21 

30.9 

134 

26 

38.2 

135 

26 

38.2 

136 

27 

39.7 

137 

33 

48.5 

138 

38 

55.9 

139 

43 

63.2 

140  

45 

66.1 

141 

48 

70.6 

142 

50 

73.5 

143 

56 

82.3 

144 

59 

86.7 

145 

62 

91.1 

146 

63 

92.6 

147 

63 

92.6 

148 

63 

92.6 

149 

64 

94.1 

150 

66 

97.0 

151 

66 

97.0 

152 

67 

98.5 

153 

67 

98.5 

154 

67 

98.5 

155 

68 

100.0 

end  on.  The  data  of  Table  11  are  put  in  this  form  in  Table  12. 
The  integral  curve  plotted  from  the  data  of  Table  12  is  shown  in 
Fig.  43. 
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Head  height- 
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Order  of  the  Individual 

Fig.  42. — Ogive  of  ungrouped  frequencies  of  head  height,  from  Table  11. 


1 19  121  123  125  127  129  131  133. 135  137  139  141  143  145  141 149  151  153  155 

head  Height  mm 

Fig.  43. — Integral  curve  of  ungrouped  frequencies  of  head  height  from  Table  12. 

This  form  of  diagram  shows  the  number  of  individuals  having 
a head  height  greater  or  smaller  than  any  assigned  value.  This 
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property  is  often  useful.  This  integral  form  of  diagram  may,  by 
a simple  device  discussed  in  detail  by  von  Huhn3  be  made  to  show 
relative  as  well  as,  and  along  with,  absolute  accumulated  fre- 
quencies. In  Fig.  43,  68  individuals  are  100  per  cent,  of  this  par- 
ticular group  or  sample.  Suppose,  then,  there  is  set  up  on  the 
right-hand  margin  a division  of  the  ordinal  distance  (=68  in- 
dividuals = 100  per  cent.)  into  10  equal  parts.  This  scale  will 
then  be  a percentage  or  relative  scale,  while  that  on  the  left-hand 


119  IZt  IZ3  IZS  IZ7  123  131  13?  135  137  139  141  143  145  147  149  15/  153  155 

Head  Height  mm. 

Fig.  44. — Like  Fig.  43,  but  with  added  scale  of  relative  or  percentage  frequencies. 


margin  still  remains  an  absolute  scale  for  frequencies  in  the  same 
group.  The  resulting  diagram  is  shown  as  Fig.  44. 

The  advantages  of  this  form  of  diagram  are  at  once  apparent. 
It  is  seen,  for  example,  that  90  per  cent,  of  the  group  had  head 
heights  under  145  mm.;  10  per  cent,  were  under  128  mm.  in  head 
height,  etc.  In  a wide  range  of  cases  plotting  in  this  manner 
will  obviate  all  necessity  of  calculating  percentages. 

The  student  will  note  that  the  ogive  and  integral  forms  of 
plotting  a frequency  distribution  are  fundamentally  the  same. 
The  only  essential  difference  between  Figs.  42  and  43  is  that  in  the 
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case  of  the  ogive  (Fig.  42)  frequencies  are  plotted  along  the  ab- 
scissal  axis,  and  in  the  integral  (Fig.  43)  along  the  y axis  as  usual. 
Also  the  scale  of  plotting  is  a little  different  in  the  two  diagrams. 

Non-cyclic  Time  Trend  Diagrams 

One  of  the  commonest  uses  of  the  graphic  method  in  statistics 
is  to  show  the  trend  of  events  in  time.  The  obviously  simple  way 
to  do  this  is  to  make  a line  diagram  with  time  as  abscissa  and  the 


TYPHOID  FEVER  DEATH  RATE  FOR  BALTIMORE  1889  -1919 


TOTAL  FEMALES 

^OTAL  MALES 

Fig.  45. — Death-rate  from  typhoid  in  Baltimore  1889-1919  inclusive  for  males, 
females,  and  total  population.  (From  Howard,  W.  T.,  “The  Natural  History  of 
Typhoid  Fever  in  Baltimore,  1851-1919,”  Johns  Hopkins  Hospital  Bulletin,  vol.  31, 
pp.  276-286,  319-334,  1920.) 

frequency  of  occurrence  of  the  event  in  question  as  ordinate.  Thus 
suppose  it  is  desired  to  show  the  decline  in  the  death-rate  from 
typhoid  fever  in  Baltimore  from  1889  to  1919  inclusive.  A diagram 
like  that  shown  in  Fig.  45  may  be  prepared. 

Now  it  would  appear  at  first  glance  that  this  diagram  gave  an 
adequate  representation  of  the  facts.  We  see  the  line  indicating 
a decline  in  the  rate  from  about  55  to  under  10  in  the  period  covered. 
But  actually  the  diagram  is  visually  misleading.  Why  and  how 
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it  is  so  will  now  be  shown.  Suppose  we  wish  to  compare  the  decline 
in  the  death-rate  from  tuberculosis  of  the  lungs  with  that  in  the 
death-rate  from  typhoid  fever.  Let  us  transfer  from  Baltimore 
as  a universe  of  discourse  to  the  United  States  Registration  Area. 
In  Table  13  are  given  the  death-rates  per  100,000  in  the  original 
registration  states  (Connecticut,  Indiana,  Maine,  Massachusetts, 
Michigan,  New  Hampshire,  New  Jersey,  New  York,  Rhode  Island, 
and  Vermont,  and  the  District  of  Columbia)  for  each  year  from 

TABLE  13 

Death-rates  per  100,000  Population  in  the  Original  Registration  States 

1900  to  1920  Inclusive 


Year. 

a 

Tuberculosis 
(all  forms). 

b 

Typhoid 

fever. 

1900.  . 

195.2 

31.3 

1901 

189.8 

27.5 

1902 

174.1 

26.3 

1903 

177.1 

24.6 

1904 

188.5 

23.9 

1905 

180.9 

22.4 

1906 

177.8 

22.0 

1907 

175.6 

20.5 

1908 

169.4 

19.6 

1909 

163.3 

17.2 

1910 

164.7 

18.0 

1911 

159.0 

15.3 

1912 

149.8 

13.2 

1913 

148.7 

12.6 

1914 

148.6 

10.8 

1915 

146.7 

9.2 

1916 

143.8 

8.8 

1917 

147.1 

8.1 

1918 

151.0 

7.0 

1919 

124.9 

4.8 

1920 

112.0 

5.0 

1900  to  1920  inclusive,  for  the  causes  of  death  (a)  tuberculosis 
(all  forms)  and  (b)  typhoid  fever.  The  data  are  taken  from  Mor- 
tality Statistics,  1916,  p.  21  (rates  for  years  1900  to  1909  inclusive), 
and  1920,  p.  19  (rates  for  years  1910  to  1920  inclusive).  The 
reason  for  confining  attention  to  the  original  registration  states  is 
that  the  area  and  population  at  risk  may  be  comparable  throughout. 

Using  the  same  graphic  methods  as  in  Fig.  45  and  the  data 
from  Table  13  we  get  the  result  shown  in  Fig.  46. 
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From  this  diagram  the  conclusion  which  one’s  eye  draws  at 
once  is  that  the  decline  in  the  tuberculosis  rate  has  been  much 
more  rapid  during  this  period  than  in  the  typhoid  rate.  The 
tuberculosis  line  seems  to  slope  downward  much  more  steeply. 

But  is  the  conclusion  implied  by  this  apparent  difference  in 
slope  correct?  The  diagram  presented  in  Fig.  46  does  not  enable 
an  easy,  direct  answer  to  the  question.  Why  it  does  not  will 
be  perceived  if  the  following  considerations  are  taken  into  ac- 


190001  OZ  03  04  05  06  07  08  09  JO  H 12  13  14  15  16  J7  18  19  20 

Year 

Fig.  46. — Death-rates  from  (a)  tuberculosis  (all  forms)  and  (b)  typhoid  fever  in  the 
Registration  Area,  1900-1920  inclusive.  Arithmetic  grid. 

of  time  from  a to  b there  occurred  exactly  25  per  cent,  reduction 
in  the  number  of  deaths  from  a particular  cause.  But  suppose 
further  that,  owing  to  the  different  absolute  sizes  of  the  places, 
the  actual  numbers  of  deaths  which  occurred  in  each  of  the  six 
places,  at  the  beginning  of  the  period  (time  a)  were  respectively 
5000,  4000,  3000,  2000,  1000,  and  100.  If  then  there  was,  as 
premised  above,  a reduction  in  mortality  in  the  time  period  a 
to  b of  exactly  25  per  cent.,  the  numbers  of  deaths  occurring  at 
time  b would  be  for  the  six  places  as  follows:  3750,  3000,  2250, 
1500,  750,  75.  Now  suppose  this  hypothetic  case  to  be  plotted 
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on  an  arithmetic  grid  as  is  Fig.  46.  The  result  will  be  as  shown  in 
Fig.  47,  A. 

Anyone  looking  at  this  diagram  would  surely  conclude  that  the 
decline  in  mortality  had  been  much  more  rapid  in  the  first  com- 
munity than  in  the  last.  Yet  exactly  the  same  rate  of  decline 
(25  per  cent.)  was,  by  hypothesis,  obtained  in  all  the  places.  To 
produce  a result  visually  correct  all  the  lines  ought  to  be  parallel. 


A B 

Fig.  47. — A,  Diagram  on  arithmetic  grid  to  show  result  of  25  per  cent,  reduction 
in  mortality  in  each  of  six  places  of  different  size.  Hypothetic  case.  B,  Showing  the 
result  of  plotting  the  same  data  as  in  A on  an  arithlog  grid. 

But  plainly  such  a result  cannot  be  attained  by  plotting  these 
data  on  an  arithmetic  grid. 

Suppose  now  that  the  same  data  be  plotted  on  a paper  with  a 
grid  ruling  such  that,  while  the  abscissal  scale  is  still  graduated  in 
arithmetic  progression  (i.  e.,  with  equally  spaced  steps),  the 
scale  of  the  ordinates  is  divided  not  in  arithmetic  progression, 
but  in  proportion  to  the  logarithms  of  numbers  in  arithmetic  pro- 
gression. Such  a ruling  is  called  an  arithlog  or  semi-logarithmic 
grid.  The  result  is  shown  in  Fig.  47,  B. 
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It  is  evident  that  there  has  been  an  almost  magical  trans- 
formation. The  25  per  cent,  reduction  lines  are  now  all  parallel, 
as  they  ought  to  be  if  the  diagram  is  to  tell  a visually  correct  story, 
and  surely  it  is  idle  to  plot  diagrams  if  they  are  to  tell  a visually 
incorrect  story  when  finished.  For  a diagram  is  plainly  some- 
thing to  be  looked  at.  It  produces  its  results  visually. 

It  will  be  well  now  to  go  back  and  replot  the  data  of  Fig.  46 
on  an  arithlog  grid.  The  result  is  that  shown  in  Fig.  48. 


1900  01  02.  03  04  05  06  07  08  09  10  // 

Year 


!Z  13  14  15  16  17  18  19  20 


Fig.  48. — Death-rates  from  (a)  tuberculosis  (all  forms)  and  ( b ) typhoid  fever  in 
the  original  registration  states,  1900-1920  inclusive.  Arithlog  grid.  Compare 
with  Fig.  46. 


The  correct  conclusion  is  now  apparent.  Typhoid  fever  mor- 
tality has  declined  at  a much  more  rapid  rate  in  the  period  covered 
than  has  tuberculosis  mortality.  And  the  fact  is  immediately  ap- 
parent visually , as  it  ought  to  be  if  a diagram  is  used  at  all. 

The  advantages  of  the  arithlog  grid  when  trends  are  to  be 
represented  graphically  has  been  emphasized  by  all  recent  American 
writers  in  this  field,  notably  by  Fisher,4  Field,5  and  Whipple  and 
Hamblen.6  The  papers  of  Fisher  and  Field  especially  should  be 
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carefully  read  by  the  student  for  the  full  and  scholarly  discussion 
of  this  matter  which  they  give. 

Fisher  sums  up  the  advantages  of  this  method  of  plotting  trends 
(he  calls  a chart  on  an  arithlog  grid  a “ratio  chart”)  as  follows: 

“The  eye  reads  a ratio  chart  more  rapidly  than  a difference 
chart  or  a table  of  figures.  We  may  recapitulate  what  most  easily 
catches  the  eye  as  follows: 

“1.  If  we  see  a curve  ascending,  and  nearly  straight,  we  know 
that  the  statistical  magnitude  it  represents  is  increasing  at  a nearly 
uniform  rate. 

“2.  If  the  curve  is  descending,  and  nearly  straight,  the  sta- 
tistical magnitude  is  decreasing  at  a nearly  uniform  rate. 

“3.  If  the  curve  bends  upward  the  rate  of  growth  is  increasing. 

“4.  If  downward,  decreasing. 

“5.  If  the  direction  of  the  curve  in  one  portion  is  the  same  as 
in  some  other  portion  it  indicates  the  same  percentage  rate  of 
change  in  both. 

“6.  If  the  curve  is  steeper  in  one  portion  than  in  another  portion 
it  indicates  a more  rapid  rate  of  change  in  the  former  than  in  the 
latter. 

“7.  If  two  curves  on  the  same  ratio  chart  run  parallel  they 
represent  equal  percentage  rates  of  change. 

“8.  If  one  is  steeper  than  another  the  first  is  changing  at  a 
faster  percentage  rate  than  the  second. 

“9.  The  imaginary  straight  line  most  nearly  representing,  to 
the  eye,  the  general  trend  of  the  curve,  is  its  ‘growth  axis,’  and 
represents  the  average  rate  of  increase  (or  decrease);  and  the 
deviations  of  the  curve  from  this  growth  axis  are  plainly  evident 
without  recharting. 

“10.  The  slope  of  the  imaginary  line  between  any  two  points 
on  a curve  indicates  the  average  rate  of  change  between  the  two.” 

Whipple  and  Hamblen  particularly  discuss  the  use  of  this  type 
of  diagram  in  public  health  work. 

Cyclic  Time  Trend  Diagrams 

A cyclic  event  is  one  whose  frequency  of  occurrence  varies  in 
an  orderly  recurring  manner.  An  example  is  found  in  the  seasonal 


Average  Weekly  Case  Rates  from  Whooping  Cough 

New  York  City  and  Philadelphia.  1906-1912 
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incidence  of  various  diseases,  as  shown  in  Fig.  49  for  whooping- 
cough  in  Philadelphia  and  New  York  City. 

This  diagram  shows  clearly  that  whooping-cough  reaches  its 
maximum  incidence  in  the  late  spring  months,  and  is  less  frequent 
at  other  periods  of  the  year. 

A method  of  plotting  such  cyclic  events  sometimes  used  is 


Fig.  50. — Diagram  showing  time  of  harvesting  of  principal  sugar  crops  of  the  world. 
(Reproduced  from  source  indicated  in  text,  by  permission  of  Mr.  Earl  D.  Babst.) 


through  the  employment  of  polar  co-ordinates.  In  this  type  of 
diagram  the  frequencies  corresponding  to  a given  time  are  laid  off  as 
ordinates  radiating  from  a central,  polar  point.  On  account  of  the 
greater  familiarity  which  generally  exists  with  regard  to  diagrams 
of  the  type  of  Fig.  49  these  are  perhaps  to  be  preferred  in  ordinary 
statistical  work  to  polar  co-ordinate  diagrams  for  cyclic  events. 
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An  interesting  and  useful  method  of  showing  graphically  the 
time  relations  of  certain  kinds  of  cyclic  phenomena  is  presented 
in  Fig.  50.  This  diagram,  taken  from  the  Annual  Report  for  1922 


Fig.  51. — World  map  of  activities  of  International  Health  Board  during  1920.  (Reproduced  by  permission  of  Mr. 

Wickliffe  Rose  from  Seventh  Ann.  Rept.  International  Health  Board,  1921.) 
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V77A  Departments  visited  in  1913 
lljjjjl  Departments  visited  in  1919 
Departments  visited  in  1920 
Departments  unfinished  in  1920 


15S§3£3 Departments  surveyed  and  organized  in  1918-1919 
28111111111  Departments  surveyed  and  organized  in  1920 
5 ^^Departments  surveyed  and  in  process  of 
J organization 

l^pvroiDepartments  in  process  of  organization,  partially 
surveyed  or  in  correspondence 
Departments  surveyed  and  organized  by  the 
medical  bureau 
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Number  of  Dispensaries  in  e&ch  Department 


271  • Dispensaries  functioning  on  Dec. 31  1920 
88  ♦ Dispensaries  in  process  of  organization 
90  O Dispensaries  In  project 


Fig.  52. — Organization  and  activities  of  Commission  for  the  Prevention  of  Tuber- 
culosis in  France:  1.  Work  of  educational  division,  showing  departments  visited  by 
traveling  exhibits  during  1918,  1919,  and  1920.  2.  Work  of  division  of  departmental 

organization,  showing  departments  in  which  antituberculosis  organization  has  been 
effected  or  is  in  progress.  3.  Number  of  tuberculosis  dispensaries  in  each  department 
co-operating  with  the  Commission  on  December  31,  1920.  4.  Total  number  of 

tuberculosis  dispensaries  functioning,  in  process  of  organization,  or  in  project  at  the 
end  of  1920.  (Reproduced  by  permission  of  Mr.  Wickliffe  Rose  from  Seventh  Ann. 
Rept.  International  Health  Board,  1921.) 
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of  the  American  Sugar  Refining  Company,  shows  the  time  re- 
lations of  harvesting  of  the  principal  sugar  crops  of  the  world,  the 
sizes  of  the  respective  crops  being  plotted  to  polar  co-ordinates. 

Statistical  Maps 

Maps  may  be  usefully  employed  for  the  graphic  presentation 
of  certain  types  of  data.  Such  maps  are  of  two  types  in  the  main : 
(a)  Spot  maps  and  ( b ) shaded  or  colored  maps. 

In  the  spot  map  the  locality  of  occurrence  of  an  event  is  indicated 
by  a properly  located  dot  on  the  map.  This  type  of  map  is  much 
used  in  epidemiologic  work.  An  example  of  such  a map  is  given 
in  Fig.  51,  showing  the  distribution  of  the  different  sorts  of  activities 
of  the  International  Health  Board  in  1920. 

Figure  51  illustrates  that  by  using  different  sorts  of  “spots”  one 
can  indicate  a number  of  facts  and  relations  on  the  same  spot  map. 

In  shaded  maps  different  types  of  shading  or  coloring  of  areas 
are  used  to  bring  out  statistical  facts.  Figure  52  gives  examples 
of  such  maps,  as  well  as  another  instance  of  a spot  map. 

Scatter  Diagrams 

For  certain  purposes  it  is  useful  to  employ  the  device  of  placing 
dots  instead  of  lines  in  a reference  plane  of  rectangular  co-ordinates 


Fig.  53. — Scatter  diagram  showing  the  correlation  between  indices  of  aggregation 
of  population  (per  cent,  urban)  and  age  distribution  of  deaths  from  measles,  1917-24, 
in  36  registration  states,  southern  states  (circled)  included.  (From  Doull,  J.  A.: 
Amer.  Jour.  Hygiene,  vol.  8,  p.  635,  1928.) 

to  show  the  distribution  of  individual  events  or  values.  This  scheme 
has  particularly  been  used  by  biometricians  to  show  graphically 
the  distribution  of  individual  variation  of  organisms  relative  to 
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two  correlated  variables.  Such  a diagram  brings  out  clearly  the 
“scatter”  of  the  individual  variates,  whence  the  name  of  this  type 
of  diagram  is  derived.  They  are  also  sometimes  called  “spot” 
diagrams.  An  example  is  given  in  Fig.  53. 

Scatter  diagrams  are  to  be  regarded,  in  general,  as  preliminary, 
graphic  aids,  primarily  useful  in  the  working  stage  of  a research, 
rather  than  as  finished  exhibits  in  the  final  presentation  of  results. 
However,  in  certain  cases,  such  as  the  one  illustrated  in  Fig.  53, 
they  are  valuable  in  emphasizing  the  distribution  of  the  individual 
observations  in  the  published  presentation. 

Nomograms 

Up  to  this  point  in  the  discussion  of  graphic  methods  every 
case  has  dealt  with  the  plotting  of  but  two  variables.  Nomography 
is  a development  of  graphic  methods  which  permits  the  repre- 
sentation of  theoretically  n variables  upon  a plane  surface.  The 
invention  of  co-ordinate  geometry  was  due  to  Descartes,  who 
developed  the  idea  of  representing  graphically  two  variables  in 
a plane.  Buache,  in  1752,  showed  that  a third  variable  could  be 
added  by  the  use  of  contour  lines.  D’Ocagne9  hit  upon  the  idea 
of  collinear  points  as  furnishing  a method  of  dealing  graphically 
with  n variables  in  a plane.  To  him  is  due  the  name  “nomography,” 
which  is  given  to  this  branch. 

The  outstanding  usefulness  of  nomography  is  to  facilitate  the 
numerical  solution  of  complex  mathematical  expressions  and 
relations.  An  example  of  a nomogram  for  this  purpose  is  to  be 
found  on  page  34  of  Pearson’s  “Tables  for  Statisticians  and  Bio- 
metricians.” 

Space  is  lacking  here  for  any  detailed  development  of  this 
subject.  The  statistician  and  the  medical  man  will,  however, 
do  well  to  master  it,  because  it  has  many  important  applications 
in  these  fields.  The  best  brief  account  in  English  is  that  of  Hezlet.7 
Brodetsky’s8  book  is  a sound,  if  pedagogically  somewhat  inept 
introduction  to  the  subject.  An  elementary  treatise  which  is  in 
some  respects  much  better  is  that  of  Frechet  and  Roullet.11 
D’Ocagne’s9  own  writings  are,  of  course,  the  final  authority,  but 
not  particularly  adapted  to  the  medical  man  with  a meager  equip- 
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ment  of  mathematics.  Also  the  two-volume  treatise  by  Soreau12 
may  be  consulted. 

A single  example,  of  the  simplest  possible  character,  may  be 
given  here  to  indicate  in  some  measure  what  a nomogram  fun- 
damentally is,  and  the  logic  underlying  the  construction  of  nomo- 
grams. Suppose  we  wish  to  set  up  a nomogram  for  the  graphic 
solution  of  the  expression 

x = a + b 

Lay  off  on  two  parallel  lines  scales  with  equally  spaced  divisions. 
The  scales  may  be  divided  with  any  desired  degree  of  fineness, 
may  be  of  any  length  one  pleases,  and  may  be  as  far  apart  (or  near 
together)  as  one  pleases.  One  of  these  scales  will  be  the  a scale 
(i.  e.,  that  upon  which  values  of  a are  to  be  read)  and  the  other 
the  b scale.  Now,  plainly,  it  will  be  possible  to  draw  somewhere 
between  the  a and  b lines  of  Fig.  54  a third  line  parallel  to  the 
other  two,  and  so  graduated  that  if  a straight-edge  connects  any 
value  on  a with  any  value  on  b the  point  where  the  straight-edge 
crosses  x will  give  a reading  on  % which  will  satisfy  the  relation 
x = a +-  b.  The  problem  is  to  find  the  location  of  the  x line 
and  its  graduation.  To  do  this  is  very  simple,  as  shown  in 
Fig.  54. 

We  know  that 

when  a = —20  and  b = +15,  x = — 5 
a = +15  and  b = — 20,x  = —5 

If  then  we  draw  straight  lines  connecting  these  two  particular 
values  of  a with  the  two  connected  values  of  b,  the  point  where 
these  two  lines  cross  each  other  must,  in  the  first  place,  lie  on  the 
x line,  and  in  the  second  place  must  be  the  point  on  that  line  which 
is  to  be  graduated  —5.  Again,  we  know  that 

when  a — + 5 and  b = 0,  x = +5 

a = — 10  and  b = +15,  x — +5 

Draw  these  lines,  and  we  shall  have  determined  a second  point 
on  the  x line.  Two  points  being  sufficient,  we  have  now  located 
the  position  in  space  and  the  direction  of  the  x line.  Its  further 
graduation  may  be  wrought  out  by  continuation  of  the  same 
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process,  though  to  do  it  that  way  would  be  a highly  unintelligent 
procedure  in  the  case  of  so  simple  a relationship. 

Two  examples  may  be  given  of  nomograms  for  dealing  with 
medical  problems.  The  first  relates  to  the  calculation  of  the  surface 
area  of  the  human  body  from  known  height  and  weight.  Feldman 
and  Umanski*  have  published  a nomogram  of  the  DuBois  equation 

S = 71.84  W°-425H0-725 


Fig.  54. — Construction  of  addition  nomogram.  See  text. 

& 

This  is  reproduced  as  Fig.  55. 

The  second  example  is  one  of  Lawrence  J.  Henderson’sf  nomo- 

* Feldman,  W.  M.,  and  Umanski,  A.  J.  V.:  The  Nomogram  as  a Means  of 
Calculating  the  Surface  Area  of  the  Living  Human  Body,  Lancet,  vol.  202,  February 
11,  1922,  pp.  273,  274. 

t Henderson,  L.  J.:  Blood  as  a Physicochemical  System,  Jour.  Biol.  Chem,, 
vol.  46,  pp.  411-419,  1921. 
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grams  relating  six  variables  in  the  physiology  of  the  blood.  It  is 
shown  in  Fig.  56. 
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Fig.  55. — Nomogram  for  S = 71.84  W0-425  H0-726,  where  S = surface  in  sq. 
cm.,  W = weight  in  kg.,  “and  H = height  in  cm.  A straight  line  joining  given 
values  of  W and  H cuts  the  middle  scale  at  the  correct  value  of  S.  Thus  a line  joining 
the  point  24  on  the  weight  scale,  with  the  point  110  on  the  height  scale,  will  cut  the 
surface  scale  at  a point  corresponding  to  8375,  which  means  that  the  surface  area  of  a 
person  24  kilograms  in  weight  and  110  cm.  in  height  is  8375  sq.  cm.  (From  Feldman 
and  Umanski.) 


The  six  variables  involved  in  this  nomogram  are  the  free  and 
combined  oxygen  of  the  whole  blood,  [02]  and  [Hb02];  the  free 


GRAPHIC  REPRESENTATION  OF  STATISTICAL  DATA  195 

and  combined  carbonic  acid  of  the  serum,  [H2C03]  and  [BHCOJ; 
the  hydrogen-ion  concentration  of  the  serum,  expressed  as  [pH]; 
and  the  chlorid  concentration  of  the  serum,  [BC1]. 

This  nomogram  expresses  at  once  the  results  of  Barcroft 


Mol  par  liter 


Fig.  56.  Nomogram  for  certain  physicochemical  relations  of  the  blood.  (From 

L.  J.  Henderson.) 


upon  the  oxygen  dissociation  curve  of  blood,  and  of  Christiansen, 
Douglas,  and  Haldane  on  the  carbon  dioxid  dissociation  curve, 
as  well  as  the  peculiarities  of  the  acid-base  equilibrium,  and  of  the 
distribution  of  chlorids.  Obviously  it  has  the  property  that  if  values 
are  assigned  to  any  two  of  the  variables,  all  six  are  determined. 
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Henderson’s  original  paper  must  be  consulted  for  further  dis- 
cussion of  this  nomogram.  Anyone  interested  in  the  application 
of  nomography  to  medical  problems  should  read  the  same  author’s 
Silliman  Lectures.13  They  represent  the  most  extensive,  varied 
and  penetrating  application  of  the  method  to  biological  problems 
which  has  yet  been  made. 

A further  example  illustrating  the  application  of  nomography 
to  statistical  material  and  problems  is  given  in  Chapter  VIII, 
where  a life  table  nomogram  is  presented  and  discussed. 


ELEMENTARY  STANDARDS  IN  GRAPHIC  WORK 

In  1915  a widely  representative  joint  committee  of  engineering, 
statistical,  economic,  biologic,  and  other  societies,  interested  in 


Fig.  I 


Yean  Tons 

1900.  270,588 

1914.  555,031 
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00 


Fig.  2 


2 Where  possible  represent  quantities  by  linear  magnitudes  as 
areas  or  volumes  are  more  likely  to  be  misinterpreted. 


3 For  a curve  the  vertical  scale, 
whenever  practicable,  should  be  so  se- 
lected that  the  zero  line  will  appear  on 
the  diagram. 


Sales 


the  promotion  of  sound  methods  of  graphic  presentation  of 
data,  published10  a preliminary  report  on  standards.  This  re- 
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port  is  so  valuable  for  the  beginner  in  this  type  of  work  that, 
with  the  permission  of  the  Chairman  of  the  committee,  Mr.  Willard 
C.  Brinton,  its  essential  parts  are  here  reproduced  in  full. 


4 If  the  zero  line  of  the  vertical 
scale  will  not  normally  appear  on 
the  curve  diagram,  the  zero  line 
should  be  shown  by  the  use  of  a 
horizontal  break  in  the  diagram. 


PerCent 


Hour 
Fig.  4 


Population  r.rm. 


or 


5 The  zero  lines  of  the 
scales  for  a curve  should  be 
sharply  distinguished  from  the 
other  coordinate  lines. 


Loss 


Fig.  5C 
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6 For  curves  having  a 
scale  representing  percentages, 
it  is  usually  desirable  to  em- 
phasize in  some  distinctive 
way  the  100  per  cent  line  or 
other  line  used  as  a basis  of 
comparison. 


7 When  the  scale  of 
a diagram  refers  to  dates, 
and  the  period  repre- 
sented is  not  a complete 
unit,  it  is  better  not  to 
emphasize  the  first  and 
last  ordinates,  since  such 
a diagram  does  not  repre- 
sent the  beginning  or  end 
of  time. 
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Fig.  6C 


Population 


GRAPHIC  REPRESENTATION  OF  STATISTICAL  DATA 


8  When  curves  are  drawn  on 
logarithmic  coordinates,  the  limit- 
ing lines  of  the  diagram  should 
each  be  at  some  power  of  ten  on 
the  logarithmic  scales. 


Population 


Fig.  9A 


Population 


Fig.  9B 


9  It  is  advisable  not  to  show  any  more  coordinate  lines  than 
necessary  to  guide  the  eye  in  reading  the  diagram. 


10  The  curve  lines  of  a 
diagram  should  be  sharply 
distinguished  from  the  ruling. 
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Population 


Fig.  IIA 


11  In  curves  representing  a 
series  of  observations,  it  is  ad- 
visable, whenever  possible,  to 
indicate  clearly  on  the  diagram 
all  the  points  representing  the 
separate  observations. 


Analysis 

%Ash 
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Fig.  I IB 


Pressure 

lbs.perSq.ln. 


Fig.  IIC 


12  The  horizontal  scale  for 
curves  should  usually  read  from 
left  to  right  and  the  vertical  scale 
from  bottom  to  top. 
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Fig.  12 
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Gain 

or 


Fig.  I3A  Fig.  I3B  Fig.  I3C 


13  Figures  for  the  scales  of  a diagram  should  be  placed  at 
the  left  and  at  the  bottom  or  along  the  respective  axes. 


Fig.  I4A 


Fig.  I4B 


Fig.  I4C 


14  It  is  often  desirable  to  include  in  the  diagram  the  numer- 
ical data  or  formulae  represented. 
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15  If  numerical 
data  are  not  included 
in  the  diagram  it  is 
desirable  to  give  the 
data  in  tabular  form 
accompanying  the 
diagram. 


Population 


Fig.  1 5 
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Year 

Population 

1840 

I860 

I860 

1870 

1880 

1890 

1900 

1910 

17,069,453 
23,191. 876 
31,443,321 
38,358,371 
50,155,783 
62,622,250 
75,994,575 
91,972.266 

16  All  lettering  and  all 
figures  on  a diagram  should 
be  placed  so  as  to  be  easily 
read  from  the  base  as  the 
bottom,  or  from  the  right- 
hand  edge  of  the  diagram 
as  the  bottom. 


17  The  title  of  a diagram  should  be 
made  as  clear  and  complete  as  possible. 
Sub-titles  or  descriptions  should  be 
added  if  necessary  to  insure  clearness. 
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CHAPTER  VII 


RATES  AND  RATIOS 

In  Chapter  III  the  raw  materials  of  statistics,  the  absolute 
frequencies  of  occurrence  of  events,  were  discussed.  In  many 
sorts  of  problems  absolute  frequencies  will  not  alone  suffice  for 
the  intelligent  discussion  of  problems.  The  reason  for  this  is 
simple.  To  say  that  in  one  city  2596  persons  died  of  tuberculosis 
in  a year,  while  in  another  city  1304  died  in  the  same  year  of  the 
same  disease  conveys  no  particularly  useful  information.  It  is 
essential  to  know,  in  addition,  the  populations  of  the  two  cities, 
at  least.  Otherwise  it  is  impossible  to  form  any  conception  of 
whether  tuberculosis  was  more  fatal  in  the  one  place  than  in  the 
other.  In  short , it  is  necessary  to  know  the  number  exposed  to  the 
risk  of  the  happening  of  a particular  event , before  the  full  significance 
of  the  statistics  of  that  event  can  be  appreciated. 

The  calculation  of  rates  in  statistical  work  consists  in  arriving 
at  frequencies  of  occurrence  relative  to  the  number  exposed  to 
risk  of  the  occurrence.  Properly  calculated  rates  are  said  to  meas- 
ure: 

In  the  case  of  deaths,  the  force  of  mortality. 

In  the  case  of  births,  the  force  of  natality. 

In  the  case  of  sickness,  the  force  of  morbidity. 

The  “force  of  mortality”  is  expressed  as  the  proportion  of 
those  exposed  to  risk  who  die.  Thus,  if  100  persons  are  truly 
exposed  to  risk  of  dying  within  a given  year,  and  3 die,  the 
force  of  mortality  within  the  time  limit  of  that  year  is  3 per  cent. 

It  should  be  noted  at  the  outstart  of  the  discussion  of  rates 
that  “number  exposed  to  risk”  does  not  always,  or  indeed  usually, 
mean  precisely  the  same  thing  as  “number  living.”  For  example, 
suppose  that  in  a particular  community,  say  New  York  State  in 
1900,  452  persons  died  of  puerperal  septicemia,  and  in  the  same  state 
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the  same  year  there  were  living  7,284,461  persons.  These  facts 
do  not  imply  that  the  true  force  of  mortality  of  puerperal  septicemia 
was  452-7,284,461  - .00006,  or  6 per  100,000. 

The  true  force  of  mortality  must  be  quite  different  from  this 
because: 

(a)  Males  cannot  have  puerperal  septicemia,  and  are,  therefore, 
not  at  risk  of  dying  from  this  disease. 

(b)  Females  under  ten  or  over  sixty  years  of  age  are  not  exposed 
to  risk  of  dying  from  this  disease,  because  they  are  outside  the 
reproductive  period  of  life. 

(c)  Women  not  in  the  puerperium,  i.  e.,  who  have  not  recently 
been  pregnant,  are  not  exposed  to  risk  of  death  from  this  dis- 
ease. 

So  then  it  appears  that  from  the  figure  of  7,284,461  living  there 
must  be  subtracted  at  the  start  all  the  males,  and  then  all  the 
females  except  those  in  a certain  physiologic  state.  The  number 
of  live  births  in  New  York  State  in  1900  was  143,156.  Now, 
adding  to  this  number  4 per  cent,  of  itself,  to  correct  roughly  for 
stillbirths,  multiple  births,  etc.,  the  number  148,900  may  be  taken 
approximately  to  represent  the  number  of  women  who  during 
that  year  were  in  the  puerperal  state.  So  then  the  figure  for  force 
of  mortality  from  this  disease  becomes  roughly  somewhere  in  the 
neighborhood  of  452  — 148,900  = .003,  or  300  per  100,000,  a 
very  different  figure  indeed  from  the  6 per  100,000  with  which 
we  started. 

My  colleague,  Dr.  W.  T.  Howard,1  has  discussed  in  detail  the 
true  risk  of  mortality  in  child-bearing,  and  his  more  precise  and 
thorough  treatment  of  the  matter  should  be  read  in  connection  with 
the  simple,  rough  example  given  above. 

This  same  fallacy  of  using  an  incorrect  figure  for  the  exposed 
to  risk  often  appears  in  medical  statistics.  An  example  may  be 
cited.  Litchfield  and  Hardman*  reported  excellent  results  in  the 
treatment  of  laryngeal  diphtheria  by  suction  to  remove  the  mem- 
brane. They  presented  a table,  here  reproduced  as  Table  14,  to 
contrast  their  results  before  and  after  the  use  of  this  treatment. 

* Litchfield,  H.  R.,  and  Hardman,  R.  P.:  Suction  in  the  Treatment  of  Laryngeal 
Diphtheria,  Jour.  Amer.  Med.  Assoc.,  vol.  80,  pp.  524-526,  1923. 


2 o6 


MEDICAL  BIOMETRY  AND  STATISTICS 


TABLE  14 


Comparative  Data  on  Treatment  of  Laryngeal  Diphtheria  (Litchfield  and 

Hardman’s  Table  1) 


-May-December 


Total  cases  of  laryngeal  diphtheria 
No  local  treatment — mild  cases.  . . 

Applicator  treatment 

Applicator  and  intubation 

Intubation 

Suction 

Suction  and  intubation 

Total  deaths 

Mortality 


1921. 

158 

43 

13 

18 

84 

0 

0 

41 

26-  % 


1922. 

106 

21 

12 

0 

18 

46 

9 

14 

13+% 


Now,  the  mortality  percentages  given  in  the  last  line,  26—  per 
cent,  in  1921  (no  suction  treatment),  and  13+  per  cent,  in  1922 
(suction  treatment  in  some  cases),  are  reckoned  on  the  basis  41/158 
= .26,  and  14/106  = .13.  But  it  appears  that  in  1921  there  were 
43  cases  so  mild  as  to  be  given  “no  treatment”  (text  p.  526),  and 
in  1922  there  were  21  cases  of  the  same  sort.  Clearlv  these  64 

j 

patients  were  not  a proper  part  of  the  “universe  of  discourse,” 
if  that  universe,  as  is  the  fact,  concerns  itself  with  discourse  about 
different  modes  of  treatment.  They  were  not  treated , therefore 
they  cannot  possibly  have  any  bearing  upon  the  relative  merits 
of  different  kinds  of  local  treatment,  either  one  way  or  the  other. 
Furthermore,  none  of  them  died,  as,  of  course,  was  to  be  expected. 
Actually  there  were  treated  in  1921,  158  — 43  = 115  cases,  and 
in  1922,  106  — 21  = 85  cases.  Of  these  treated  cases,  41  died 
in  1921,  and  14  in  1922.  Hence  the  true  comparative  mortality 
rates  per  cent,  in  the  two  years  of  this  experience,  are 


For  1921,  = 36  per  cent. 

14  X 100  ^ 

For  1922, — = 16  per  cent. 

o5 

Or,  in  other  words,  calculated  on  a proper  basis  the  results  in 
1922  were  even  better  relatively  than  those  stated  by  the  authors. 

DEFINITION  AND  CLASSIFICATION  OF  RATES  AND  RATIOS 

The  basic  relative  figures  of  vital  statistics  may  conveniently 
be  divided  into  rates  and  ratios. 
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A rate  has  the  following  form: 


R = 


a + b)’ 


(i) 


which,  expressed  in  words,  means 

j'  The  number  of  times  a specified  kind  of  event  actually  occurs. 

Rate  = wh°le  number  of  exposures  to  risk  of  its  occurrence,  i.  e., 

] the  number  of  times  it  actually  occurs  -f  the  number  of  times  it 
{ might  occur,  but  does  not. 

The  part  of  the  right-hand  member  of  the  rate  equation  which 
is  in  brackets  limits  the  universe  of  discourse  to  which  the  rate 
applies  to  a particular  kind  of  event,  as  for  example  “death”  or 
“birth.” 

A rate  is  also  limited  to  a particular  universe  of  discourse  in 
time.  This  is  done  by  preliminary  definition.  Thus  a death-rate  is 
“annual,”  referring  to  the  deaths  in  a specified  year,  or  “monthly” 
or  “weekly,”  etc. 

A rate  as  defined  above  states  the  result  on  an  individual  basis. 
Numerically  it  will,  in  this  form,  obviously  be  always  a decimal 
fraction.  In  order  to  put  rates  into  whole  numbers  rather  than 
fractions,  so  that  they  may  be  more  easily  read  and  comprehended, 
it  is  the  customary,  and  now  generally  conventionalized,  practice  to 
multiply  rates  on  an  individual  basis,  as  above  defined,  by  some 
multiple  of  10.  They  thus  become  rates  per  cent,  (when  the  rate 
on  an  individual  base  is  multiplied  by  100);  or  rates  per  thousand 
(when  the  rate  on  an  individual  base  is  multiplied  by  1000),  and 
so  on. 

The  commonly  employed  rates  in  biostatistical  work  may  be 
classified  as  follows: 

A.  Death-rates  (Mortality  rates). 

1.  Observed  actual  death-rates,  obtained  by  the  direct 
application  of  equation  (i),  without  assump 
tions: 

(a)  Crude  death-rates. 

(h)  Specific  death-rates. 

(c)  Infant  mortality  rates. 

(d)  Case  fatality  rates. 
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2.  Theoretic  death-rates  based  upon  certain  assumptions: 

(a)  Standard  (or  standardized)  death-rates. 

(b)  Corrected  death-rates. 

(These  theoretic  death-rates  will  be  considered  in  detail  in 
Chapter  IX,  after  certain  requisite  preliminaries  have  been  ex- 
plained in  Chapter  VIII.) 

B.  Birth-rates  (Natality  rates). 

1.  Observed  actual  birth-rates  obtained  from  equation  (i): 

(a)  Crude  birth-rates. 

(b)  Specific  birth-rates. 

2.  Theoretic  birth-rates,  based  upon  certain  assumptions: 

(a)  Standardized  birth-rates. 

( b ) Corrected  birth-rates. 

C.  Morbidity  Rates. 

1.  Observed,  actual: 

(a)  Crude. 

(b)  Specific. 


D.  Marriage  Rates 


E.  Divorce  Rates 


As  these  two  categories  fall,  in  actual 
practice,  rather  in  the  field  of  demo- 
graphic statistics  than  in  that  of  med- 
ical statistics,  they  will  not  be  further 
considered. 

Each  of  the  types  above  mentioned  will  be  discussed  in  detail 
farther  on. 

Before  doing  so,  however,  it  will  be  well  to  define  and  classify 
the  ratios  commonly  used  in  biostatistics. 

A ratio  is  a relative  figure  in  fractional  form,  but  distinguished 
from  a rate  by  the  fact  that  the  denominator  does  not  denote  the 
number  exposed  to  risk  of  occurrence  of  the  event,  whose  fre- 
quency of  occurrence  is  given  by  the  numerator. 


Ro  = 


a 


\C  -f-  d 


(ii) 


where 

R0  = a ratio, 

a = the  number  of  times  an  event  of  some  specified  kind  occurs, 
c + d = the  number  of  times  some  other  kind  of  event,  in  general  different 
from  the  a event,  occurs,  although  in  some  cases  c = a. 
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There  are  but  two  sorts  of  ratios  at  all  commonly  employed  in 
biostatistical  work,  viz.: 

(a)  Death  ratios. 

(i b ) Birth-death  ratio  (or  Vital  Index). 

Each  of  these  different  sorts  of  rates  and  ratios  will  now  be 
discussed  and  illustrated  in  some  detail.  But  before  going  on  to 
this  it  is  important  to  emphasize  particularly  one  point.  It  is 
this:  As  defined  above,  each  of  the  rates  mentioned  is  mathe- 
matically an  expression  measuring  a probability . When  in  a 
later  chapter  the  discussion  of  the  theory  of  probability  is  under- 
taken this  fact  about  death-rates,  birth-rates,  etc.,  will  be  more 
easily  and  fully  appreciated.  But  it  is  desired  to  bring  it  out  here 
in  anticipation  of  the  more  formal  discussion  of  probability  in 
order  that  the  reader  may  fully  realize  from  the  start  that  what 
a death-rate  or  a birth-rate  really  measures,  in  a mathematical 
sense,  is  always  a probability.  The  conventional  use  of  the  con- 
stant multiplier  of  100  or  1000,  etc.,  in  stating  rates  tends  somewhat 
to  disguise  (at  least  to  the  unwary)  this  fact,  but  in  the  detailed 
discussion  of  rates  pains  will  be  taken  to  state  formally  what  prob- 
ability it  is  that  each  particular  rate  measures. 

CRUDE  DEATH-RATES 

Here  the  fundamental  equation  (i)  becomes 


where 

Rc  — crude  death-rate, 

D = deaths  from  all  causes, 

P - total  population  = D + (P  — D)  = P. 

(Crude  death-rates  are  usually  stated  “per  1000”  or  “per  100,000.”) 

Nothing  could  be  less  refined  than  this.  The  deaths  are  not 
separated  as  to  cause,  and  the  entire  population  is  assumed  to  be 
at-risk  of  death.  The  annual  crude  death-rate  measures  the  prob- 
ability of  a person,  regardless  of  age,  sex,  race,  or  occupation, 
dying  within  one  year,  from  any  cause  whatever,  in  a population 
constituted  in  respect  of  its  age,  sex,  racial  and  occupational  dis- 
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tribution,  as  the  population  under  discussion  happens  to  be.  A 
crude  death-rate,  in  other  words,  is  an  absolutely  accurate  and 
precise  measure  of  something  which,  because  of  its  heterogeneous, 
composite,  unanalyzed  character,  is  not  particularly  worth  meas- 
uring accurately.  So  many  variables  besides  those  essentially 
lethal  can  (and  do)  influence  the  stated  values  of  crude  death-rates 
as  to  make  them  rather  untrustworthy  for  any  but  the  broadest 
and  roughest  conclusions  and  estimates.  Taken  alone  and  by 
themselves,  in  the  complete  absence  of  any  other  knowledge  than 
that  furnished  by  the  crude  rates  themselves,  they  must  be  em- 
ployed with  the  utmost  caution  and  reservation  in  comparisons  of 
one  locality  or  one  time  with  another.  The  reasons  for  the  great 
unreliability  of  crude  rates  for  comparative  purposes  will  more 
and  more  clearly  appear  as  we  proceed. 

Another  class  of  crude  death-rates  is  given  by  the  expression 


where  D'  = deaths  from  a particular  cause  or  group  of  causes  only, 
and  all  the  other  letters  have  the  same  significance  as  before. 
Thus  we  might  have  the  crude  death-rate  for  tuberculosis  of  the 
lungs.  This  represents  the  first  step  in  specification,  but  does  not 
go  far.  Indeed  R'c  may  certainly  be  said  in  a good  many  cases  to 
give  a wholly  false  measure.  It  does  not  measure  any  rational 
probability,  because  P still  is  the  total  living  population.  But 
as  we  have  seen  earlier  not  all  P is  exposed  to  risk  of  dying,  for 
example,  of  puerperal  septicemia.  Therefore  the  probability  given 
R'c  is  in  that  case  a false  one.  Rc  does  measure  a true  prob- 
ability, because  all  P is  exposed  always  to  the  risk  of  dying  of  some- 
thing or  other,  but  it  is  not  a very  important  or  interesting  prob- 
ability. In  short,  Rc  is  rather  a fool,  while  R'c  is  a knave. 

The  crude  rate  from  all  causes  Rc  may  be  used  with  a fair  de- 
gree of  safety  for  comparing  the  relative  mortality  of  the  same  place 
(city,  state,  etc.)  at  different  times , provided  the  periods  com- 
pared are  not  too  far  apart,  and  provided  the  place  has  not  under- 
gone rapid  growth  or  decline  in  population  during  the  period. 
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The  reason  for  this  is  that  in  fairly  stable,  large  communities 
the  age  and  sex  constitution  of  the  population  changes  only  very 
slowly.  This  fact  is  well  illustrated  by  Fig.  57,  which  shows  the 
age  distribution  of  the  living  population  of  Amsterdam,  at  nine 
consecutive  census  periods  (1829  to  1920  inclusive).  It  is  at  once 
apparent  that  in  this  long  period  the  age  constitution  of  the  popula- 
tion of  Amsterdam  has  not  shown  any  very  considerable  changes. 


Fig.  57. — The  proportion  per  thousand  of  the  total  population  of  Amsterdam  falling  in 
different  age  classes,  at  each  of  nine  census  periods  between  1829  and  1920. 


It  has  been  shown  analytically  by  Lotka2  that,  under  certain 
conditions  not  widely  different  from  those  which  prevail  in  large 
human  population  aggregates,  the  age  distribution  tends  to  converge 
toward  a stable  normal  condition  or  state. 

The  crude  rate  from  all  causes  Rc  is  wholly  unreliable  as  an 
index  of  the  relative  mortality  in  different  places , unless  it  be  first 
shown  by  a preliminary  investigation  that  the  populations  of  the 
places  compared  are  substantially  identical  in  age  and  sex  dis- 
tribution, a condition  which  is  usually  not  carried  out. 
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SPECIFIC  DEATH-RATES 

Here  the  fundamental  equation  becomes 


where 

Rs  = specific  death-rate, 

De  = deaths  in  a specified  class  of  the  population, 

E = number  exposed  to  risk  of  dying,  in  the  same  specified  class  of  the 
population  from  which  the  deaths  come. 

(Specific  death-rates  are  usually  stated  as  “per  1000.”) 

In  actual  statistical  practice  at  the  present  time  death-rates 
are  commonly  made  specific  with  reference  only  to  age  and  sex. 
This  means  a situation  like  the  following:  In  a community  A there 
were  living  in  a particular  year  say  100  males , the  age  of  each  of 
whom  was  between  12  and  12.99  years.  Of  these  persons  say 

10  died  within  the  year.  Then  Ras  = > which  means  that 

the  annual  death-rate,  specific  for  age  and  sex  (. Ras ),  in  this  com- 
munity was  0.1  for  males  between  twelve  and  thirteen  years  of 
age,  or  100  per  thousand. 

Specific  death-rates  are  the  true  and  best  measures  of  the 
force  of  mortality.  They  furnish  a real  and  meaningful  measure 
of  the  probability  that  certain  specified  kinds  of  persons  will  die 
within  the  time  period  (usually  one  year)  specified  in  forming  the 
rate.  From  age  specific  death-rates  (which  the  English  commonly 
speak  of  as  measures  of  “mortality  at  ages”)  is  derived  all  the 
really  fundamental  knowledge  which  we  have  of  the  laws  of  mor- 
tality. 

It  will  be  well  at  this  point  to  put  before  the  reader  a definite 
picture  of  the  form  of  the  specific  death-rate  curve  from  all  causes. 
This  is  done  in  Table  15  and  Fig.  58,  in  which  the  rates  are  specific 
for  quinquennial  age  groups. 

It  will  be  noted  that  this  specific  death-rate  curve  has  a char- 
acteristic form.  Starting  at  a high  point  in  earliest  infancy  the 
specific  rate  drops  till  it  reaches  a low  point  in  the  age  group  10-14. 
From  that  point  on  it  rises  steadily,  though  not  entirely  evenly, 
till  the  end  of  the  life  span.  The  specific  death-rates  are  lower 
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TABLE  15 


Age  and  Sex  Specific  Death-rates,  per  1000  Living,  prom  All  Causes  for  the 
U.  S.  Registration  Area  (Exclusive  of  North  Carolina)  in  1910. 
(Author’s  Computation  from  Census  Bureau  Data.) 


Ages. 

Males. 

Females. 

Under  1 

124.4 

143.4 

1-4 

15.1 

13.8 

5-9 

3.7 

3.5 

10-14 

2.5 

2.4 

15-19 

4.1 

3.7 

20-24 

6.0 

5.2 

25-29 

6.8 

6.1 

30-34 

8.0 

6.8 

35-39 

9.8 

7.8 

40-44 

11.6 

8.9 

45-49 

14.5 

11.0 

50-54 

18.5 

14.6 

55-59. 

25.7 

20.6 

60-64 

36.1 

29.4 

65-69 

51.4 

44.3 

70-74 

75.1 

66.8 

75-79 

112.2 

100.9 

80-84 

168.1 

155.9 

85-89 

237.9 

222.7 

90-94 

313.0 

309.7 

95-99 

410.2 

368.9 

100  and  over 

494.2 

471.7 

in  females  than  in  males  at  every  age  period  in  life  except  the 
first  (under  1). 

Specific  death-rates  can  obviously  be  calculated  for  each  sepa- 
rate cause  of  death,  and  will  furnish  exact  and  useful  information 
about  comparative  forces  of  mortality. 

The  most  extensive  compilation  of  age  specific  death-rates  for 
the  United  States  is  contained  in  Mortality  Rates  1910-1920 10  pub- 
lished by  the  Census  Bureau.  The  text  and  tables  of  this  report 
should  be  studied  carefully,  in  order  to  get  a general  understanding 
of  human  mortality  statistics.  They  will  be  found  useful  for  refer- 
ence in  many  connections. 

It  is  apparent  that  the  specificity  of  death-rates  may  be  ex- 
tended to  any  degree,  provided  the  necessary  data  relative  to 
population  and  to  deaths  are  available.  For  a really  penetrating 
insight  into  the  forces  of  mortality,  both  for  purposes  of  research 
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and  the  administration  of  public  health,  death-rates  ought  to  be 
made  specific  for  the  following  factors : 

1.  Age. 

2.  Sex. 


Fig.  58. — Age  and  sex  specific  death-rates  from  all  causes  for  the  U.  S.  Regis- 
tration Area  (exclusive  of  North  Carolina)  in  1910.  Plotted  from  data  of  Table  15, 
on  an  arithlog  grid. 


3.  Race  (or  country  of  birth  of  person  and  parents  at  least). 

Race  will  include  color. 

4.  Occupation. 

5.  Locality  of  dwelling  (urban  or  rural). 
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Each  of  these  factors  more  or  less  profoundly  influences  the 
force  of  mortality.  Death  certificates  carry  the  necessary  data 
(at  least  theoretically,  and  actually  if  properly  filled  out)  regarding 
deaths.  Every  ten  years  the  census  collects  the  necessary  data 
regarding  the  population.  If  only  these  data  could  be  properly 
tabulated  and  published  it  would  be  possible  to  calculate  in  census 
years  the  death-rates  specific  for  the  above  five  factors.  Eventually 
this  will  surely  be  done.  The  sciences  of  medicine  and  hygiene 
will  imperiously  demand  it.  In  the  meantime  we  make  shift  to  get 
along  by  groping  in  the  dark  in  respect  of  all  factors  except  age,  sex, 
and  urban  or  rural  dwelling. 

The  sort  of  probability  which  a death-rate  specific  for  the 
above  five  factors  would  measure  is,  for  example,  the  probability 
that  a male  person,  aged  twenty,  native  born  of  native  white 
parents,  living  in  the  country  and  by  occupation  a farmer,  would 
die  within  one  year. 

INFANT  MORTALITY  RATES 

Here  the  fundamental  equation  (i)  becomes 


where 

Ri  = infant  mortality  rate, 

D%  = deaths  of  infants  under  one  year  of  age, 

B = births. 

(Infant  mortality  rates  are  usually  stated  as  “per  1000.”) 

The  question  which  will  inevitably  occur  to  the  reader’s  mind 
at  this  point  is:  Why  not  use  the  age  specific  death-rate  for  age 
under  one  as  the  measure  of  infant  mortality?  To  which  the 
answer  is,  Such  would  be  the  practice  if  it  were  not  for  the  difficulty 
of  getting  accurately  (or  annually)  a count  of  the  population 
under  one  year  of  age.  But  because  this  is  difficult  and  the  results 
are  known  to  contain  large  errors,  whereas  the  registration  of  births 
is  or  can  be  made  accurate,  the  form  of  death-rate  given  above  is 
generally  used  as  the  measure  of  infant  mortality  rather  than  the 
simple  age  specific  death-rate  under  one. 

The  theory  on  which  the  formula  for  Ri,  given  above,  is  based, 
is  obvious.  The  number  of  babies  born  in  a given  year  is  held  to 
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be  at  least  a fair  index  of  the  number  of  babies  exposed  to  risk  of 
dying  within  the  year  under  one  year  of  age.  Actually,  of  course, 
it  does  not  measure  the  exposed  to  risk  of  dying  under  one  year. 
Because,  consider  a given  calendar  year;  the  baby  born  on  De- 
cember 1st  of  that  year  is  only  exposed  for  one  month  to  risk  of 
dying  under  one  year  of  age  within  that  calendar  year.  But,  on  the 
other  hand,  given  a fairly  stable  population,  and  accurate  birth 
registration,  the  error  in  the  absolute  value  of  the  infant  mortality 
rate  introduced  by  the  relations  just  mentioned,  will  be  a con- 
stant one  over  fairly  long  periods  of  time,  and,  because  constant, 
negligible  when  the  rates  are  used  for  comparative  purposes. 

In  the  present  state  of  knowledge  upon  the  subject  it  is  im- 
possible to  state  exactly  what  the  probability  is  that  is  measured 
by  Ri. 

The  infant  mortality  rates,  as  defined  by  Ri,  for  American 
cities  of  100,000  or  more  population  in  1920  are  given  in  Table  16. 

It  will  be  noted  from  this  table  that  there  is  great  variation 
among  the  different  cities  in  the  rate  of  infant  mortality.  This 
variation  has  been  discussed  biometrically  elsewhere.3  Its  sig- 
nificance, from  the  standpoint  of  public  health  and  preventive 
medicine,  is  very  great.  In  the  paper  referred  to  it  was  pointed 
out  that  the  facts  of  variation  make  it  clearer  where  the  funda- 
mental administrative  problems  of  control  of  infant  mortality  lie 
than  perhaps  could  be  done  in  any  other  way.  The  first  step  in 
the  solution  of  any  problem  is  obviously  a clear  definition  of  the 
problem  itself.  We  see,  as  we  pass  from  city  to  city,  town  to 
town,  or  rural  county  to  rural  county,  that  the  rate  of  infant 
mortality  varies  greatly.  In  a hypothetic  commonwealth  where 
the  most  perfect  administrative  control  over  infant  mortality  pos- 
sible or  conceivable  had  been  attained  this  variation  would  to  a 
considerable  extent  disappear,  the  only  residue  of  diversity  be- 
tween communities  in  respect  of  infant  mortality  being  such  as 
arose  either  (1)  purely  by  the  operation  of  chance,  that  is,  from 
random  sampling,  or  (2)  from  the  racial  composition  of  the  several 
populations,  or  (3)  from  fundamentally  uncontrollable  environ- 
mental differences,  such  as  climate,  soil,  etc.,  or  (4)  from  some 
combination  of  these  factors  (1)  to  (3).  Now  with  the  actually 
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TABLE  16 

Infant  Mortality  Rates  (Deaths  Under  One  Year  of  Age  per  1000  Live  Births) 
in  Registration  Cities  of  100,000  Population  or  More  in  1920.  (Re- 
arrangement of  Data  from  Birth  Statistics,  1920,  p.  26.) 

Cities.  1920  rate. 

Lowell,  Mass 135 

Fall  River,  Mass 129 

New  Bedford,  Mass 122 

Scranton,  Pa 119 

Richmond,  Va 114 

Pittsburgh,  Pa Ill 

Kansas  City,  Kans 108 

Baltimore,  Md 106 

Syracuse,  N.  Y 105 

Detroit,  Mich 104 

Buffalo,  N.  Y 103 

Boston,  Mass 101 

Norfolk,  Va 100 

Hartford,  Conn 99 

Grand  Rapids,  Mich 99 

Reading,  Pa 99 

Cambridge,  Mass 96 

Columbus,  Ohio 96 

Youngstown,  Ohio 95 

Milwaukee,  Wis 94 

Bridgeport,  Conn 92 

Omaha,  Neb 92 

Washington,  D.  C 91 

Indianapolis,  Ind 91 

Philadelphia,  Pa 91 

Yonkers,  N.  Y 89 

Toledo,  Ohio 89 

New  Haven,  Conn 87 

Cleveland,  Ohio 87 

Louisville,  Ky * 86 

Springfield,  Mass 85 

Worcester,  Mass 85 

New  York,  N.  Y 85 

Dayton,  Ohio 85 

Rochester,  N.  Y 84 

Akron,  Ohio 84 

Cincinnati,  Ohio 82 

Albany,  N.  Y 77 

St.  Paul,  Minn 73 

Salt  Lake  City,  Utah 72 

Los  Angeles,  Calif 71 

Oakland,  Calif 71 

Spokane,  Wash 71 

Minneapolis,  Minn 65 

San  Francisco,  Calif 62 

Portland,  Ore 60 

Seattle,  Wash 57 


existing  condition  of  variation  between  different  communities  in 
respect  of  infant  mortality,  it  is  obvious  that  there  probably  are 
definite  and  presumably  in  large  degree  determinable  reasons  for 
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each  large  particular  difference  which  exists.  Just  as  obviously, 
before  administrative  control  can  effectively  wipe  out  these  mor- 
tality differences  and  get  all  communities  at  or  near  the  level  of 
the  lowest,  we  must  know  something  about  the  determining  causes 
upon  which  they  depend.  Efforts  to  reduce  infant  mortality  have 
in  the  past  been  attended  with  considerable  success.  With  the 
advance  of  general  sanitation  the  death-rate  under  one  year  of  age 
has  fallen  enormously.  Greenwood  quotes  some  interesting  figures 
on  the  point  from  Farr,  which  we  may  well  reproduce  here  to  show 
how  enormous  has  been  the  improvement : 


TABLE  17 

Showing  the  Reduction  in  the  Mortality  of  Infancy  and  Early  Childhood. 

(After  Greenwood.) 


Period. 

1730-49. 

1750-69. 

1770-89. 

1790-1809. 

1810-29. 

Percentage  deaths  under  five  years.  . . 

74.5 

63.0 

51.5 

41.3 

31.8 

But  after  such  a decline  as  these  figures  indicate,  to  continue 
the  reduction  presents  a difficult  problem  to  the  administrative 
official.  The  easy  part  of  the  conflict  has  happened  and  is  in  the 
past.  To  continue  the  good  fight  with  the  same  relative  measure 
of  success,  one  presumably  needs  to  know  more  precisely  than  is 
now  known  the  pattern  of  the  causal  nexus  which  controls  and 
determines  the  rate  of  infant  mortality.  The  problem  confronts 
the  administrative  official  or  the  altruistic  organization  in  a specific 
rather  than  a general  manner.  City  A has  a death-rate  under 
one  year  of  age  so  low  that  even  the  most  sanguine  of  hygienic 
optimists  would  hardly  undertake  seriously  to  reduce  it  further  by 
any  significant  amount.  In  City  B,  on  the  other  hand,  babies  die 
like  flies,  only  somewhat  more  rapidly.  City  B differs  in  many 
respects  from  A.  Some  of  these  respects  are  such  as  to  be  easily 
within  the  power  of  control  of  a health  official.  Others,  such  as 
climate  or  the  racial  composition  of  the  population,  for  example, 
are  obviously  beyond  the  possibility  of  any  control  or  modification. 
Others  lie  between  the  two  extremes,  and  offer  practical  diffi- 
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culties  of  varying  degrees.  What  one  needs  to  know  is  which 
particular  line  of  effort  will  in  practice  yield  the  largest  return. 
And  it  is  real  knowledge,  not  a priori  logic,  that  is  wanted.  Let  a 
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single  example  illustrate.  It  has  been  maintained  that  excessive 
infant  mortality  is  primarily  the  resultant  of  the  so-called  “de- 
grading influence”  of  poverty,  and  such  a contention  stirs  a warmly 
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sentimental  feeling  of  agreement  in  the  minds  of  a well-meaning 
public,  zealous  to  do  good.  This  relationship  obviously  ought  to 
be  true,  therefore  to  a too-common  type  of  mind  it  must  be  and  is 

TABLE  19 

Infant  Mortality  Rates  (Deaths  Under  One  Year  Per  1000  Births)  for  Vari- 
ous Countries 

(Rearrangement  of  Data  from  Birth  Statistics,  1920,  p.  40,  and  Birth  Statistics,  1925, 
Part  II,  pp.  67-69.  Numbers  in  parentheses  denote  the  year  to  which  the  rate 
applies.) 


Country. 

Male. 

Female. 

Hungary 

281.9  (1915) 

181.0  (1925) 

244.6  (1915) 

153.8  (1925) 

Russia 

264.9  (1909) 

236.9  (1909) 

Chile 

260.9  (1918) 

262.2  (1925) 

248.2  (1918) 

253.2  (1925) 

Ceylon 

227.8  (1919) 

181.3  (1925) 

217.3  (1919) 

162.1  (1925) 

Austria 

204.2  (1913) 

170.7  (1920) 

174.6  (1913) 

141.8  (1920) 

Japan 

181.8  (1917) 

151.1  (1925) 

164.2  (1917) 

133.3  (1925) 

German  Empire 

177.1  (1914) 

115.8  (1925) 

149.2  (1914) 

93.9  (1925) 

Prussia 

177.1  (1914) 

150.2  (1914) 

Italy 

174.5  (1916) 

125.6  (1925) 

157.7  (1916) 

113.0  (1925) 

Jamaica 

167.7  (1919) 

184.7  (1925) 

155.4  (1919) 

162.6  (1925) 

Bulgaria  

166.1  (1911) 

145.7  (1911) 

Spain 

163  5 (1917) 

146. 1 (1917) 

Serbia 

144.7  (1910) 

132.4  (1910) 

Belgium 

132.1  (1912) 

104.4  (1925) 

107.2  (1912) 

82.4  (1925) 

Uruguay 

124.7  (1920) 

121.1  (1925) 

109.5  (1920) 

108.7  (1925) 

France 

122.7  (1913) 

98.9  (1925) 

101.7  (1913) 

78.6  (1925) 

Finland 

122.6  (1918) 

92.9  (1925) 

107.5  (1918) 

76.6  (1925) 

Scotland 

112.9  (1919) 

103.5  (1925) 

89.6  (1919) 

77.0  (1925) 

Denmark 

101.3  (1919) 

90.5  (1925) 

81.2  (1919) 

68.6  (1925) 

United  Kingdom 

101.3  (1919) 

89.7  (1922) 

79.0  (1919) 

68.8  (1922) 

England  and  Wales 

100.0  (1919) 

84.0  (1925) 

77.6  (1919) 

65.7  (1925) 

Ireland 

97.3  (1919) 

75.0* 

11  c cinin'i 

60.5* 

97. 7f  (1925) 

/ / . o tlyiyj 

74. 4f  (1925) 

Switzerland 

96.9  (1918) 

63.6  (1925) 

79.1  (1918) 

52.9  (1925) 

United  States  (birth  regis- 

tration  area) .......... 

95.1  (1920) 

79.5  (1925) 

76.1  (1920) 

63.3  (1925) 

Australian  Common- 

wealth 

76.7  (1920) 

58.8  (1925) 

61.1  (1920) 

47.7  (1925) 

Sweden .... 

76  6 (1916) 

62.5  (1916) 

Norway 

70.6  (1917) 

54.3  (1924) 

57.0  (1917) 

45.9  (1924) 

The  Netherlands 

55.2  (1919) 

66.0  (1925) 

43.9  (1919) 

50.3  (1925) 

New  Zealand 

53.6  (1918) 

44.0  (1925.) 

43.0  (1918) 

35.6  (1925) 

* Irish  Free  State, 
f Northern  Ireland. 


true.  But  Greenwood  and  Brown,4  in  what  may  fairly  be  regarded  as 
one  of  the  most  thoroughly  sound,  critical,  and  penetrating  contribu- 
tion which  has  yet  been  made  to  the  problem  of  infant  mortality, 
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are  unable  “to  demonstrate  any  unambiguous  association  between 
poverty  . . . and  the  death-rate  of  infants.” 

The  plain  fact  is  that  before  control  or  ameliorative  measures 
can  be  applied  with  the  maximum  of  efficient  economy  to  the 
general  public  health  problem  of  infant  mortality  there  is  need  to 
know  a great  deal  more  than  we  now  do  about  the  quantitative 
influence  of  the  general  factors  which  induce  spatial  and  temporal 
differences  in  the  rate  of  that  mortality.  But  first  it  will  be  helpful 
to  get  an  adequate  conception  of  the  magnitude  and  character  of 
the  differences  themselves. 

The  distribution  of  variation  in  infant  mortality  in  cities  and 
rural  areas  in  the  United  States  is  shown  in  Table  18,  taken  from 
the  paper  cited. 

The  infant  mortality  rates  of  various  countries  are  given  in 
Table  19. 

It  is  evident  from  Table  19  that  in  the  period  covered  by  the 
data  the  infant  mortality  rate  declined  notably  in  most  of  the 
countries.  The  outstanding  exceptions  to  this  rule  are  Chile, 
Jamaica,  Uruguay,  and  The  Netherlands.  It  will  also  be  noted 
that  in  every  country  listed  the  infant  mortality  rates  are  lower  for 
females  than  for  males. 

CASE  FATALITY  RATES 

Here  the  fundamental  equation  becomes 


where 

Rf  — case  fatality  rate, 

Dc  = deaths  amongst  recognized  cases  of  the  disease  for  which  the  rate  is 
calculated, 

C = cases  of  the  disease. 

(Case  fatality  rates  are  usually  expressed  as  aper  100,”  occasionally  as  “per  1000.”) 

This  is,  provided  age,  sex,  race,  occupation,  and  locality  of 
dwelling  are  taken  into  account,  the  most  refined  form  of  specific 
death-rate.  Because,  in  the  most  exclusive  sense,  those  who  have 
a given  disease  are  the  most  truly  exposed  to  risk  of  dying  of  that 
disease  at  that  time.  The  case  fatality  rate  for  typhoid,  for  ex- 
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ample,  measures  the  probability  that  a person  who  has  typhoid 
will  die  at  that  time  (i.  e.,  within  the  course  of  the  attack)  of  that 
disease. 

Unfortunately,  our  knowledge  of  true  case  fatality  rates,  even 
for  the  commonest  diseases,  is  very  meager,  because  of  the  in- 
adequacy of  the  reporting  of  morbidity.  The  case  fatality  rate  is, 
of  all  the  data  of  biostatistics,  the  most  interesting  to  the  clinician, 
because  of  its  obvious  bearing  upon  prognosis.  The  most  reliable 
data  in  existence  on  case  fatality  rates  are  those  derived  from  the 
experience  of  great  hospitals.  But  these  do  not  give  a true  scientific 
picture  of  the  situation  for  two  reasons:  First,  a hospital  popula- 
tion is  an  adversely  selected  population.  In  the  main,  the  cases 
which  get  into  a hospital  are  those  in  which  the  prognosis  at  a fairly 
early  stage  of  the  disease  is  thought,  often  on  the  best  of  grounds, 
to  be  in  some  degree  unfavorable.  Consequently,  hospital  case 
fatality  rates  tend  to  be  unduly  high.  This  state  of  affairs  becomes 
grossly  exaggerated  when  it  is  the  practice  for  the  hospitals  of  a city 
to  send  to  one  particular  hospital,  usually  that  one  supported  by 
the  municipality,  the  greater  part  of  their  cases  which  upon  entrance 
are  seen  to  be  either  moribund  or  of  very  bad  prognosis. 

In  the  second  place,  the  treatment  of  a disease  in  a hospital  may 
significantly  influence,  in  a differential  manner,  the  course  of  the 
disease,  as  compared  statistically  with  the  treatment  given  on  the 
average  outside. 

There  is  a wonderful  field  open  to  the  quantitatively  inclined 
student  of  medicine,  in  the  procuring  and  biometric  analysis  of 
accurate  case  fatality  rates. 

BIRTH-RATES 

The  crude  birth-rate  is  given  by 


where 

Rb  — crude  birth-rate, 

B — number  of  births  (but  exclusive  of  still-births)  in  a given  time,  as  a year 
P = total  living  population. 

(Crude  birth-rates  are  usually  stated  as  “per  1000”  or  “per  10,000.”) 
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This  rate  is  obviously  a most  crude  measure  of  the  reproductive 
capacity  of  a population.  To  begin  with,  not  all  living  persons  are 
exposed  to  the  risk  of  having  a baby.  Only  females,  and  those 
between  certain  ages  (roughly  from  ten  to  sixty  as  outside  limits) 
are  liable  to  this  occurrence.  Furthermore,  under  existing  con- 
ditions of  law  and  public  sentiment,  in  the  main  the  giving  of  birth 
to  babies  is  confined  to  married  women  within  the  age  limits  stated. 
So  then  to  arrive  at  anything  like  a true  general  measure  of  the 
force  of  natality  it  will  be  essential  first  to  differentiate  between 
legitimate  and  illegitimate  births,  and  between  living  and  still- 
births, and  in  the  second  place,  to  use  as  the  denominator  of  the  rate 
fraction  for  legitimate  babies  the  number  of  married  women  between 
the  age  limits  of  say  ten  and  sixty  years.*  For  the  illegitimate  rate 
the  denominator  must  be,  of  course,  the  unmarried  women  within 
the  same  age  limits. 

As  to  the  reliability  and  significance  of  crude  birth-rates,  as 
commonly  calculated  with  the  total  population  for  denominator, 
much  the  same  considerations  apply  as  have  already  been  set  forth 
for  crude  death-rates.  They  can  be  used  for  comparison  of  different 
places  only  with  the  utmost  caution,  because  differences  in  the  age 
and  sex  constitution  of  the  populations  compared,  quite  regardless 
of  their  true  forces  of  natality,  may  have  most  profound  effects 
upon  the  rates.  So  long  as  the  population  of  a given  place  is 
changing  only  slowly  in  its  composition,  its  crude  birth-rates  are 
fairly  comparable  inter  se  at  different  times,  as,  for  example,  in 
successive  years.  In  the  routine  official  birth  statistics  of  the 
United  States  it  is  the  crude  birth-rate  which  is  tabulated. 

For  a considerable  number  of  years  the  crude  birth-rate  has  been 
falling  in  most  civilized  countries.  A general  conspectus  of  birth- 
rate statistics  for  different  countries  is  shown  in  Table  20,  taken 
from  Knibbs5  for  the  years  down  to  and  including  1912,  and  from 
the  Registrar- General’s  Statistical  Review  of  England  and  Wales 
for  1927  (Text,  p.  117)  for  1913  through  1927. 

* The  limits  usually  taken  are  15  and  45,  50  or  55.  Actually,  however,  there 
are  occasionally  recorded  births  from  mothers  under  fifteen  and  over  fifty-five  years  of 
age.  There  are  not  many  such,  of  course,  but  still  it  is  a physiologic  fact  that  there 
is  a minute  risk  that  some  women  may  become  pregnant  and  bear  a child  at  or  very 
near  the  extreme  ages  of  ten  and  sixty  that  have  been  stated  above. 


224 


MEDICAL  BIOMETRY  AND  STATISTICS 


TABLE  20 


Crude  Birth-rates  for  Various  Countries — 1860-1927 — Per  10,000  of  the 

Population 


Year. 

Australia. 

England  and 
Wales. 

Scotland. 

Ireland. 

France. 

Prussia. 

Italy. 

Switzerland. 

Norway. 

Sweden. 

Denmark. 

Netherlands. 

Belgium. 

Austria. 

Hungary. 

Mean. 

1860. 

426 

343 

356 

262 

386 

348 

319 

306 

379 

381 

1861. . ............. 

423 

346 

349 

269 

377 

326 

318 

354 

308 

372 

344 

1862. 

433 

350 

346 

265 

372 

334 

310 

332 

301 

379 

342 

1863 

417 

353 

350 

269 

395 

336 

311 

364 

318 

403 

352 

1864.  . . . . 

429 

354 

356 

240 

266 

397 

379 

336 

303 

357 

315 

403 

345 

1865 

421 

354 

355 

257 

265 

393 

385 

328 

314 

361 

314 

378 

344 

1866 

398 

352 

354 

262 

264 

393 

390 

331 

322 

354 

327 

379 

421 

350 

1867 

404 

354 

351 

260 

264 

371 

367 

308 

305 

354 

321 

366 

388 

340 

1868 

405 

358 

353 

268 

257 

369 

354 

275 

312 

349 

325 

379 

424 

341 

1869. 

387 

348 

343 

267 

257 

379 

372 

282 

295 

343 

316 

393 

426 

339 

1870. 

387 

352 

346 

277 

255 

383 

369 

298 

288 

305 

361 

323 

396 

417 

339 

1871. 

380 

350 

345 

281 

229 

383 

370 

291 

292 

304 

302 

354 

310 

389 

430 

331 

1872 

371 

356 

349 

278 

267 

397 

379 

300 

297 

300 

303 

360 

323 

391 

410 

339 

1873. 

374 

354 

348 

271 

260 

396 

363 

299 

299 

308 

308 

362 

325 

399 

422 

339 

1874. 

368 

360 

356 

266 

262 

401 

349 

305 

307 

309 

309 

364 

326 

397 

427 

334 

1875 

359 

354 

352 

261 

259 

407 

377 

320 

312 

312 

319 

366 

325 

399 

450 

345 

1876. 

360 

363 

356 

264 

262 

407 

392 

330 

318 

308 

326 

371 

332 

400 

463 

350 

1877 

350 

360 

353 

262 

255 

399 

370 

323 

318 

311 

324 

366 

323 

387 

436 

343 

1878. . 

354 

356 

349 

251 

252 

387 

362 

316 

311 

298 

317 

361 

315 

386 

431 

337 

1879. 

358 

347 

343 

252 

251 

390 

378 

308 

320 

305 

320 

367 

315 

392 

458 

340 

1880. 

352 

342 

336 

247 

246 

378 

339 

298 

307 

294 

318 

355 

311 

380 

428 

323 

1881. 

353 

339 

337 

245 

249 

370 

380 

300 

300 

291 

323 

350 

314 

377 

429 

351 

1882 

345 

338 

335 

240 

248 

367 

371 

291 

309 

294 

324 

353 

312 

391 

438 

331 

1883. 

348 

335 

328 

235 

248 

371 

372 

288 

309 

289 

318 

343 

305 

382 

448 

328 

1884. 

356 

336 

337 

239 

247 

376 

390 

285 

310 

300 

334 

349 

305 

387 

456 

334 

1885 

357 

329 

327 

235 

243 

377 

386 

280 

313 

294 

326 

344 

299 

376 

448 

328 

1886. 

354 

328 

329 

232 

239 

377 

370 

280 

309 

298 

325 

346 

296 

380 

456 

328 

1887. 

356 

319 

317 

231 

235 

377 

389 

280 

308 

297 

320 

337 

294 

382 

442 

326 

1888. 

355 

312 

313 

228 

231 

374 

375 

278 

308 

288 

317 

337 

291 

379 

438 

322 

1889. 

346 

311 

309 

227 

230 

371 

383 

276 

297 

277 

313 

332 

295 

379 

437 

313 

1890. 

350 

302 

304 

223 

218 

366 

358 

264 

303 

280 

306 

329 

287 

367 

403 

311 

1891.  

345 

314 

312 

231 

226 

377 

372 

278 

309 

283 

309 

337 

296 

370 

423 

319 

1892 

337 

304 

307 

225 

223 

363 

362 

274 

296 

270 

295 

320 

289 

362 

404 

309 

1893 

328 

307 

308 

230 

228 

375 

365 

277 

307 

274 

305 

338 

295 

379 

426 

316 

1894. 

308 

296 

299 

230 

223 

366 

355 

273 

298 

271 

301 

327 

290 

367 

415 

307 

1895 

304 

303 

300 

233 

217 

369 

349 

273 

306 

275 

300 

328 

285 

381 

418 

310 

1896 

284 

296 

304 

237 

225 

369 

348 

281 

304 

272 

305 

327 

290 

380 

405 

309 

1897. 

282 

296 

300 

235 

222 

365 

347 

283 

300 

267 

298 

325 

290 

375 

403 

306 

1898 

271 

293 

301 

233 

218 

367 

335 

285 

303 

271 

302 

319 

286 

363 

377 

302 

1899. 

273 

291 

298 

231 

219 

363 

339 

290 

309 

264 

297 

321 

288 

373 

393 

303 

1900. 

273 

287 

296 

227 

214 

361 

330 

286 

301 

270 

297 

316 

289 

373 

393 

301 

1901 

272 

285 

295 

227 

220 

362 

326 

290 

296 

270 

297 

323 

294 

366 

378 

300 

1902 

267 

285 

293 

230 

217 

355 

334 

285 

289 

265 

292 

318 

284 

371 

389 

298 

1903 

253 

285 

294 

231 

211 

344 

317 

274 

288 

257 

287 

316 

275 

353 

369 

290 

1904 

264 

280 

291 

236 

209 

347 

329 

273 

281 

258 

289 

314 

271 

356 

374 

290 

1905 

262 

273 

286 

234 

206 

335 

327 

269 

274 

257 

284 

308 

261 

339 

363 

285 

1906 

266 

272 

286 

235 

206 

337 

321 

269 

267 

257 

285 

304 

257 

350 

365 

285 

1907. 

268 

265 

277 

232 

197 

330 

317 

262 

264 

255 

282 

300 

253 

340 

367 

281 

1908 

266 

267 

281 

233 

201 

327 

337 

264 

263 

257 

285 

297 

249 

337 

369 

282 

1909 

267 

258 

273 

234 

195 

317 

327 

255 

263 

256 

282 

291 

237 

334 

377 

278 

1910 

268 

251 

262 

233 

196 

305 

333 

250 

261 

247 

275 

286 

237 

325 

357 

273 

1911. 

272 

244 

256 

232 

187 

294 

315 

242 

259 

240 

267 

278 

229 

314 

350 

265 

1912 

286 

238 

259 

230 

190 

289 

324 

241 

256 

237 

267 

281 

226 

313 

363 

267 
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TABLE  20 — Continued 


Year. 

Australia. 

England  and 
Wales. 

Scotland. 

Irish  Free 
State. 

Northern 

Ireland. 

France. 

Germany. 

Italy. 

Switzerland. 

Norway. 

Sweden. 

Denmark. 

Netherlands. 

Belgium. 

Austria. 

Hungary. 

Mean. 

1913 

282 

241 

255 

228 

182 

275 

317 

232 

251 

232 

256 

283 

224 

297 

343 

260 

1914 

279 

238 

261 

226 

179 

268 

311 

224 

251 

229 

256 

283 

204 

233 

345 

252 

1915 

271 

218 

239 

220 

116 

204 

305 

195 

236 

216 

242 

263 

161 

184 

236 

220 

1916 

266 

210 

229 

210 

95 

152 

240 

189 

242 

212 

244 

266 

129 

147 

168 

200 

1917 

263 

178 

203 

198 

105 

139 

195 

185 

251 

209 

237 

262 

113 

139 

160 

189 

1918 

250 

177 

205 

200 

122 

143 

181 

187 

246 

203 

241 

250 

113 

141 

153 

187 

1919 

235 

185 

217 

200 

126 

200 

214 

186 

227 

198 

226 

244 

163 

180 

289 

206 

1920 

255 

255 

281 

222 

213 

259 

318 

209 

261 

236 

254 

283 

221 

224 

324 

254 

1921 

250 

224 

252 

202 

207 

253 

303 

208 

240 

215 

240 

274 

218 

229 

316 

242 

1922  

247 

204 

235 

195 

233 

193 

229 

302 

196 

231 

196 

222 

259 

204 

232 

306 

230 

1923 

238 

197 

228 

205 

239 

192 

210 

294 

194 

225 

188 

223 

260 

204 

225 

292 

226 

1924 

232 

188 

219 

211 

227 

187 

205 

284 

188 

211 

181 

218 

251 

199 

217 

268 

218 

1925 

229 

183 

213 

208 

220 

189 

207 

278 

184 

200 

175 

210 

242 

198 

206 

283 

214 

1926 

220 

178 

209 

206 

225 

188 

195 

272 

182 

197 

169 

205 

238 

190 

192 

273 

209 

1927  

217 

166 

198 

203 

213 

181 

183 

269 

174 

182 

161 

231 

182 

178 

252 

199 

Mean 

318 

292 

300 

238* 

220 

333 

337 

261 

279 

268 

290 

318 

270 

335 

376 

* This  mean  is  for  Ireland  as  a whole  for  the  years  1864-1921. 


SPECIFIC  BIRTH-RATES 

Age  specific  birth-rates  may  be  formed  if  the  necessary  statistical 
data  are  available  in  accordance  with  exactly  the  same  principle  as 
was  used  in  forming  age  specific  death-rates.  The  number  of 
women  of  a given  age,  or  within  a given  small  age  group,  is  used  as 
the  denominator,  and  the  number  of  babies  born  in  a year  to  women 


TABLE  21 


Age  Specific  Birth-rates  Computed  from  Australian  (1911)  Data.  (Data 


from  Knibbs,5  p 

. 325.) 

Age  of 
mothers. 

Total 

married 

Number  who  bore 
a child  during 

Specific  birth- 
(or  fertility) 

women. 

the  year. 

rate.f 

19  and  under 

8.716 

4,146 

476 

20-24 

65,959 

25,957 

394 

25-29 

110,591 

33,817 

306 

30-34 

113,310 

25,682 

227 

35-39 

105,550 

16,839 

160 

40-44 

95,573 

6,763 

71 

45  and  over 

82,933 

713 

9 

Totals 

582,632 

113,917 

196 

i 


3 


f Births  per  1000  married  women  of  indicated  age. 
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in  this  age  group  as  the  numerator  of  the  rate  fraction.  Such 
figures  measure  the  fertility  of  women  of  the  specified  class.  Mat- 
thews Duncan11  long  ago  showed  that  the  fertility  rate  varied  in 
a definite  and  lawful  manner  with  age.  Some  recent  statistics 
to  the  same  purpose  are  presented  in  Table  21,  adapted  from 
Knibbs.5 

It  is  to  be  understood  that  the  figures  in  Table  21  do  not  refer 
to  first  births  only,  but  to  all  births  regardless  of  their  order.  It  is 
seen  that  the  age  specific  birth-rates  are  highest  in  the  earlier  years, 
and  decrease  in  value  with  advancing  age.  It  will  be  remembered 
that  all  Australian  birth-rates  are  relatively  high  as  compared  with 
various  other  countries. 

There  is  a good  deal  of  confusion  in  the  use  of  the  terms  “fer- 
tility” and  “fecundity.”  The  writer  some  years  ago  discussed* 
this  terminology  in  the  following  words: 

“We  would  suggest  that  the  term  ‘fecundity’  be  used  only  to 
designate  the  innate  potential  reproductive  capacity  of  the  indi- 
vidual organism,  as  denoted  by  its  ability  to  form  and  separate 
from  the  body  mature  germ  cells.  Fecundity  in  the  female  will 
depend  upon  the  production  of  ova  and  in  the  male  upon  the 
production  of  spermatozoa.  In  mammals  it  will  obviously  be  very 
difficult,  if  not  impossible,  to  get  reliable  quantitative  data  regarding 
pure  fecundity.  On  the  other  hand,  we  would  suggest  that  the 
term  ‘fertility’  be  used  to  designate  the  total  actual  reproductive 
capacity  of  pairs  of  organisms,  male  and  female,  as  expressed  by 
their  ability  when  mated  together  to  produce  (i.  e.,  bring  to  birth) 
individual  offspring.  Fertility,  according  to  this  view,  depends 
upon  and  includes  fecundity,  but  also  a great  number  of  other 
factors  in  addition.  Clearly  it  is  fertility  rather  than  fecundity 
which  is  measured  in  statistics  of  birth  of  mammals.” 

Standardized  and  corrected  birth-rates  of  populations  may  be 
calculated  on  principles  discussed  in  Chapter  IX  for  death-rates. 

* Pearl,  R.,  and  Surface,  F.  M.:  Data  on  the  Inheritance  of  Fecundity  Obtained 
from  the  Records  of  Egg  Production  of  the  Daughters  of  “200-egg”  Hens,  Maine 
Agr.  Exp.  Sta.  Annual  Report,  1909,  pp.  49-84. 
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MORBIDITY  RATES 

The  fundamental  equation  for  a crude  morbidity  rate  is  as 
follows : 


where 

Rm  = crude  morbidity  rate, 

M — number  of  persons  sick,  either  from  all  causes  together  or  from  some 
one  particular  cause  (in  the  latter  case  the  rate,  of  course,  is  the  crude 
morbidity  rate  for  that  disease)  in  a given  stated  time, 

P -■  the  total  population. 

(Morbidity  rates  are  stated  sometimes  as  “per  1000,”  sometimes  as  “per  10,000,” 
and  sometimes  as  “per  100,000.”) 

Such  a figure  measures  the  incidence  rate  of  sickness  in  the 
population,  either  in  general  or  for  particular  diseases.  It  is 
subject  to  many,  if  not  all,  of  the  same  difficulties  that  crude  death- 
and  birth-rates  are.  Unfortunately,  however,  there  exist  so  few 
statistics  relatively  regarding  morbidity  that  it  is  somewhat  aca- 
demic to  be  too  critical  regarding  any  morbidity  rates.  Any- 
thing in  the  nature  of  age  and  sex  specific  morbidity  rates  is  practi- 
cally non-existent  at  the  present  time. 

But  there  is  no  doubt  that  morbidity  statistics  are,  by  and 
large,  of  all  statistics  the  most  potentially  valuable  to  the  adminis- 
trative public  health  official.  The  United  States  Public  Health 
Service  is  taking  a leading  position  in  the  development  of  morbidity 
statistics.  The  student  who  is  particularly  interested  in  the  sub- 
ject should  apply  to  that  Service  for  publications. 

It  is  not  fair  to  measure  the  effectiveness  of  public  health  work 
entirely  in  terms  of  mortality,  because  much  of  its  effectiveness  in 
actual  fact  has  nothing  to  do  with  mortality,  but  with  morbidity. 
This  fact  shows  itself  in  every-day  language.  We  have  boards  of 
health , not  boards  of  mortality,  and  quite  rightly  so.  Some  of  the 
human  ailments  against  which  public  health  work  directs  its  most 
effective  work  are  diseases  which  at  the  worst  are  not  particularly 
fatal.  An  example  is  uncinariasis — hookworm  disease.  It  would 
be  folly  to  attempt  to  measure  the  social  worth  of  the  campaign 
against  this  distressing  ailment  in  terms  of  mortality.  What  this 
work  accomplishes  is  not  primarily  a reduction  in  mortality,  but  a 
positive  increase  in  the  sum  total  of  human  happiness  and  well- 
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being,  individual,  social,  and  economic.  The  same  considerations 
apply  to  many  other  lines  of  public  health  work,  indeed,  to  most  of 
them.  The  most  important  causes  of  death , taken  by  and  large, 
are  not  the  ones  against  which  hygiene  and  sanitation  are,  in  the 
present  state  of  knowledge  and  of  the  organization  of  society, 
particularly  effective.  But  this  fact  should  in  nowise  be  taken  to 
mean  that  public  health  efforts  have  no  great  value. 

DEATH  RATIOS 

A death  ratio  measures  the  probability  that  in  a given  total 
number  of  deaths  from  all  causes  a particular  one  will  be  from  one 
particular  cause,  say  tuberculosis  of  the  lungs.  The  fundamental 
equation  is 


where 

RtD  = the  death  ratio, 

D'  = deaths  from  a particular  cause  (or  group  of  causes)  in  a specified  time 
interval, 

D = total  deaths  from  all  causes  in  the  same  time  interval. 

(Death  ratios  are  usually  stated  as  “per  100,”  or  “per  1000.”) 

This  statistical  constant  has  been  much  criticized,  and  has  in 
consequence  largely  fallen  out  of  general  use,  on  the  ground  that 
both  Df  and  D are  variable  quantities  affected  by  the  same  bio- 
logic forces,  and  that  in  consequence  it  is  never  possible  to  tell 
with  any  degree  of  accuracy  what  portion  of  the  derived  value 
of  Rtj)  is  due  specifically  to  D ' and  what  to  D.  Undue  weight 
has  undoubtedly  been  given  to  this  criticism.  In  principle  the 
same  criticism  applies  to  any  rate,  for  P in  a crude  death-  or  birth- 
rate, or  any  more  precisely  defined  part  of  P , is  not  an  invariable 
quantity.  As  a matter  of  fact  Rtj)  may  be  a very  valuable  statis- 
tical datum  if  used  intelligently,  and  there  is  no  statistical  datum 
whatever  that  can  be  relied  upon  to  give  correct  results  if  un- 
intelligently  employed.  The  criterion  as  to  the  usefulness  of 
Rto  is  simply  and  solely  whether  the  probability  which  it  measures 
is,  in  the  particular  premises  set  by  the  study  in  hand,  an  intelligible 
probability.  If  it  is,  Rtp  has  validity  and  usefulness. 

The  death  ratio  has  in  recent  years  been  most  effectively  em- 
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ployed  in  researches  on  tuberculosis  by  Greenwood  and  Tebb/ 
It  has  been  employed  by  Arne  Fisher12  as  a basis  for  computing  life 
tables  from  a knowledge  of  deaths  alone. 

THE  BIRTH-DEATH  RATIO  OR  VITAL  INDEX 

The  writer7  has  elsewhere  suggested  that  the  term  “vital  index” 
be  used  to  designate  that  measure  of  a population’s  condition  which 
is  given  by  the  ratio  of  births  to  deaths  within  a given  time.  It 
may  fairly  be  said  that  there  is  no  other  statistical  constant  which 
furnishes  so  adequate  a picture  as  this  of  the  net  biologic  status 
of  a population  as  a whole  at  any  given  moment.  If  the  ratio 
100  Births/Deaths  is  greater  than  100,  the  population  is  in  a grow- 
ing and  in  so  far  healthy  condition.  If  it  is  less  than  100,  the  popula- 
tion is  biologically  not  holding  its  own.  Depopulation  may  not  be 
actually  occurring  if  there  is  a sufficient  amount  of  immigration  to 
make  up  the  deficiency  in  births.  But  fundamentally  and  innately 
the  condition  is  not  a sound  one  from  a biologic  standpoint,  though 
under  certain  circumstances  it  may  conceivably  be  from  a social 
standpoint.  It  is  curious,  in  view  of  the  obvious  significance  of 
this  statistic,  the  vital  index  of  a population,  that  so  little  attention 
is  paid  to  it  by  demographers.  It  is  a highly  sensitive  measure  of 
the  immediate  biologic  status,  in  the  evolutionary  sense,  of  a 
nation  or  any  subgroup  of  people.  Wernicke* * * §  discussed  it  in  1889, 
but  did  not  use  it  in  the  most  effective  manner  or  form.  Sundbargf 
proposed  its  use  as  a “measure  of  civilization”  of  different  peoples. 
RubinJ  criticized  Sundbarg,  but  only  in  respect  of  technic,  pro- 
posing as  a measure  of  civilization  D2/B  in  place  of  D/B,  where 
D = deaths  and  B = births.  Pell§  has  dealt  with  the  idea  im- 
plicit in  the  birth/death  ratio,  but  in  an  inadequate  manner. 
The  most  extensive  and  comprehensive  discussion  of  the  vital 

* Wernicke,  J.:  Das  Verhaltniss  zwischen  Geborenen  und  Gestorbenen  in  his- 
torischer  Entwicklung  und  fur  die  Gegenwart  in  Stadt  und  Land,  Jena,  1889,  vi  and 
91  pp.  8vo. 

f Sundbarg,  G.:  Dodstalen  sassom  Kulturmatare,  Nationalokonomiska  Foren- 
ingens  Forhandlingar,  i Aaret,  1895,  Stockholm,  1896. 

J Rubin,  M.:  A Measure  of  Civilization,  Jour.  Roy.  Stat.  Soc.,  vol.  60,  pp.  148- 
161,  1897. 

§ Pell,  C.  E.:  The  Law  of  Births  and  Deaths,  London  (Unwin),  1921,  192  pp. 
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index  in  the  literature  is  that  by  Sweeney.15  He  computed  the 
vital  index  for  all  countries  of  the  world  for  which  adequate  sta- 
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tistics  were  available,  and  over  as  long  a period  of  time  in  each 
case  as  he  could.  The  student  should  read  his  book. 
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In  Table  22 
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total  births  and  deaths  of  each  state  in  the  Birth  Registration  Area 
for  the  years  1915  to  1918  inclusive. 

The  significance  of  the  several  indices  is  as  follows: 
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Vital  index  A = 


100  (births  of  whites  of  native  parents) 
Deaths  of  all  native  whites 


In  this  index  the  births  and  deaths  come  from  an  identical 
group  of  the  population.  The  children  born  were,  of  course,  native, 
and  their  parents  were  also  native  born.  The  deaths  were  of 
native  born,  i.  e.,  the  same  group  as  the  parents  of  the  births.  All 
racial  elements  (white)  are  included  in  births  and  deaths,  but  all 
are  Americans  in  the  sense  of  nativity. 


Vital  index  B 


100  (births  of  whites,  both  parents  foreign) 
Deaths  of  foreign-born  whites 


Here  again  both  births  and  deaths  come  from  an  identical  group. 
The  births  are  children  of  foreigners  in  this  country.  The  deaths 
are  of  foreigners  in  this  country. 


Vital  index  C 


100  (births  of  negroes) 
Deaths  of  negroes 


This  needs  no  discussion. 


Vital  index  D 


100  (births  of  whites) 
Deaths  of  whites 


This  is  for  comparison  with  C.  Both  C and  D are  true  vital 
indices,  in  the  sense  that  the  parents  of  the  births  in  the  numerator 
are  drawn  from  the  same  population  group  as  the  deaths  in  the 
denominator. 

Unfortunately,  on  the  basis  of  present  published  official  com- 
pilations of  statistics,  these  four  are  the  only  significant  vital  indices 
which  can  be  drawn  up.  For  any  really  deep  understanding  of 
what  the  biologic  effect  is  of  racial  fusion,  and  of  a new  environ- 
ment, on  the  net  vitality  of  populations  we  ought  to  have  a whole 
series  of  racially  specific  vital  indices.  Here  again  there  is  little 
practical  hope  of  getting  these  from  purely  official  sources.  Some 
one  must  come  forward  and  finance  a comprehensive  and  thorough 
investigation  along  these  lines  from  outside. 

The  facts  about  Indices  A , B,  C,  and  D are  set  forth  in  Table  22. 
In  this  table  a figure  in  italics  indicates  that  the  absolute  number  of 
births  and  deaths  on  which  the  index  is  based  is  in  each  case  less 
than  100.  It  will  be  noted  that  there  are  few  such  cases,  and 
that  they  are  practically  all  among  the  negroes  of  the  northern 
states. 
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This  table  presents  many  points  of  interest.  We  may  compare 
vital  indices  A and  B,  which  indicate  the  relative  biologic  vigor  of 
the  native-born  and  the  foreign-born  populations  in  this  country. 
Taking  totals  (the  last  line  of  the  table)  we  note  that  for  each  year 
Index  B is  much  larger  than  Index  A.  Generally  speaking  the 
foreign  population  produced  in  this  country  approximately  two  or 
more  babies  for  every  death,  on  the  average  during  the  years 
here  studied  with  the  exception  of  the  last.  The  native  population 
(as  defined  in  Vital  Index  A)  produced  only  a small  fraction  over 
one  baby  for  each  death.  In  other  words,  the  portion  of  the  native 
population  dealt  with  in  Table  22,  even  when  so  broadly  defined 
as  by  Index  A,  was,  in  the  period  1915-18,  in  about  the  same  state 
as  France  before  the  war,  and  not  in  as  vigorous  a state  as  the 
French  population  was  in  1920  and  1921. 

The  vital  indices  of  Table  22  are  crude  indices.  We  need  age- 
specific  vital  indices  for  native-  and  foreign-born  populations. 

Let  us  put  the  matter  in  this  way:  Suppose  that  a gigantic 
corral  were  constructed  with  two  compartments.  Suppose  that, 
further,  there  were  put  into  one  of  these  compartments,  on  a given 
date,  all  the  native-born  women  aged  twenty  to  twenty-four 
inclusive  say,  while  into  the  other  compartment  were  put  all  the 
foreign-born  women  in  the  country  of  the  same  ages.  Suppose 
them  all  to  be  told  that  they  were  to  stay  there  for  one  year,  but 
that  men  could  have  free  access  to  the  corrals  for  purposes  of  repro- 
duction. Finally,  suppose  that  similar  corrals  were  constructed, 
and  the  women  impounded  in  them,  for  each  age  group,  from  say 
ten  to  fourteen  at  one  extreme  to  fifty-five  and  over  at  the  other 
extreme. 

In  any  one  compartment  of  any  one  corral  during  the  year 
(a)  some  of  the  women  would  have  babies,  and  (b)  some  of  the 
women  would  die.  If  we  kept  statistical  record  of  these  events  we 
could,  at  the  end  of  the  year,  calculate  the  age  specific  vital  index 
for  each  group  of  women.  It  would  not  be  the  general  population 
vital  index  because  no  male  deaths  were  included.  But  it  would  be 
an  age-specific  vital  index  for  the  females  as  reproductive  units. 

The  results  of  exactly  such  an  experiment  for  the  women  of  the 
Birth  Registration  Area  in  the  year  1919  are  shown  in  Table  23. 
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TABLE  23 

Age-Specific  Vital  Indices  for  Native-born  and  Foreign-born  Women  in 

B.  R.  A.  1919 


Ages. 

Births 
from 
mothers 
born 
in  U.  S. 

Deaths 

of 

native- 

born 

females. 

Vital 

indices 

for 

native 

women. 

Births 

from 

foreign- 

born 

mothers. 

Deaths 

of 

foreign- 

born 

females. 

_ Vital 
indices 
for 

foreign 

women. 

10-14 

391 

5,002 

7.82 

15 

268 

5.60 

15-19. ...... 

77,048 

7,763 

992 . 50 

10,768 

759 

1418.71 

20-24. 

258,876 

11,854 

2183.87 

74,247 

2,120 

3502.22 

25-29. ..... 

250,548 

13,189 

1899.67 

102,429 

3,317 

3088.00 

30-34. 

166,777 

11,813 

1411.81 

83,326 

3,583 

2325.59 

35-39. ...... 

101,638 

10,603 

958.58 

56,414 

3,723 

1515.28 

40-44. ...... 

33,832 

9,511 

355.71 

18,878 

3,566 

529.39 

45-49 

3,202 

10,092 

31.73 

1,866 

4,120 

45.29 

50-54 

68 

10,926 

.62 

54 

4,968 

1.09 

55  and  over.  . 

26 

96,919 

.03 

13 

47,478 

.02 

Totals 

892,406 

187,672 

348,010 

73,902 

The  figures  in  Table  23  show  plainly  enough  that  at  every  age 
between  fifteen  and  fifty-four  inclusive  the  foreign-born  women 
have  higher  specific  vital  indices  than  native-born  women.  How 
much  so  is  shown  graphically  in  Fig.  59. 

As  a reproductive  machine  the  foreign-born  woman  far  excels 
the  native  born.  For  each  native-born  woman  dying  between 
twenty  and  twenty-four  years  of  age,  the  native-born  women  as  a 
group  produce  approximately  22  babies.  But  for  each  foreign- 
born  woman  dying  between  twenty  and  twenty-four,  the  foreign- 
born  women  as  a whole  produce  35  babies.  It  is  in  these  five 
years  that  women,  under  conditions  of  life  as  now  socially 
organized  in  the  United  States,  do  their  best  work  biologically 
for  the  race,  “best”  being  taken  here  in  the  sense  of  biologic 
efficiency  and  economy. 

If  we  had  specific  vital  indices  for  populations  of  lower  animals 
in  different  environmental  situations  we  should  be  in  a position  to 
know  a great  deal  more  than  we  now  do  as  to  the  method  of  evolu- 
tion. For  it  is  the  net  balance  between  births  and  deaths  which  is 
the  most  significant  information  that  can  be  had  about  the  progress 
of  the  struggle  for  existence. 

It  may  be  objected  in  Table  23  that  we  have  put  all  births 
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(both  male  and  female)  against  only  female  deaths.  The  thought 
in  doing  this  was  that,  after  all,  females  have  to  produce  all  the 


AGE  OF  WOMEN 

Fig.  59. — Showing  the  differences  in  specific  vital  indices  for  native-born  and 
foreign-born  women  in  1919.  Solid  line,  native-born  women;  dash  line,  foreign- 
born  women. 

babies,  whether  the  latter  are  boys  or  girls.  If  one  wishes  to 
postulate  the  problem  in  this  way:  how  many  new  reproductive 
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machines  (females)  do  women  of  a specified  age  produce  as  a class 
for  each  similar  reproductive  machine  lost  by  death?  * then,  of 
course,  one  should  take  only  female  births  in  computing  the  specific 
vital  indices.  The  result  would  be,  of  course,  that  the  births  and 
consequently  the  indices  in  Table  23  would  be  about  one-half  as 
large  absolutely  as  they  really  are  in  that  table,  but  the  general 
form  of  the  curve  of  Fig.  59  would  be  unchanged. 

In  the  Eighth  Annual  Report  of  the  Census  on  “Birth,  Still- 
birth, and  Infant  Mortality  Statistics”  for  the  year  1922  (Wash- 
ington, Government  Printing  Office,  1924)  Table  M (pp.  17,  18) 
and  Table  N (p.  19)  give  age-specific  vital  indices  for  the  age  group 
15-44,  and  separately  in  five-year  age  groupings  for  native  whites, 
foreign-born  whites,  and  negroes,  covering  the  three  years  1920-22. 
These  figures  will  repay  careful  study.  The  Report  (p.  17)  comments 
on  them  as  follows: 

“For  native  women  aged  fifteen  to  forty-four,  the  three  highest  indices  are  for 
Utah  (3100),  Nebraska  (3014),  and  Virginia  (2898.7),  and  the  three  lowest  are  for 
California  (1378.7),  Massachusetts  (1424.7),  and  New  York  (1592.9). 

“Native  white  women  aged  twenty  to  twenty-four  have  a vital  index  of  3630.6, 
while  foreign-born  white  women  of  the  same  age  have  a vital  index  of  4795.3.  For 
native  white  women  of  this  age  the  lowest  vital  index  (2592.3)  appears  for  Massa- 
chusetts and  the  highest  (5808.6)  for  Nebraska.  For  foreign-born  white  women  of 
this  age  comparatively  high  indices  appear,  for  example,  for  Pennsylvania  (6697.4) 
and  for  Connecticut  (5822.2).  Similar  differences  throughout  the  table  emphasize 
once  more  the  fact  that  foreign-born  white  women  as  a class  have  more  children  than 
native  white  women.  The  vital  indices  available  for  Negroes  are  much  lower  than 
those  for  native  white  women.” 


LeBlanc14  has  discussed  age  specific  vital  indices  for  the  Japa- 
nese population. 

For  further  discussion  of  vital  indices  see  Pearl  and  Burger,8 
Pearl*  and  Miner. f 

* Pearl,  R.:  Seasonal  Fluctuations  in  the  Vital  Index  of  a Population,  Proc. 
Nat.  Acad.  Sci.,  vol.  8,  pp.  76-78,  1922. 

t Miner,  J.  R.:  The  Probable  Error  of  the  Vital  Index  of  a Population,  Ibid., 
vol.  8,  pp.  106-108,  1922. 
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CHAPTER  VIII 

LIFE  TABLES 

A life  table  is  a particular  conventional  method  of  presenting 
the  most  fundamental  and  essential  facts  about  the  age  distribution 
of  mortality.  It  has  many  points  of  usefulness.  The  chief  one, 
and  the  one  which  is  mainly  responsible  for  having  secured  for  life 
tables  the  position  of  respectability  and  importance  that  they  now 
enjoy,  is  that  on  them  depends  the  successful  operation  of  the  great 
commercial  enterprise  which  is  somewhat  naively  called  “life 
insurance.”  But  beyond  all  this  commercial  application  life  tables 
have,  in  respect  of  their  fundamental  structure,  an  essential  place 
in  vital  statistics.  It  is  impossible  for  the  student  fully  to  grasp  the 
significance  of  certain  matters  which  will  be  discussed  as  we  proceed 
unless  he  knows  beforehand  the  main  features,  at  least,  of  the 
anatomy  of  a life  table.  It  is  to  furnish  this  background  that  the 
present  chapter  finds  a place  in  this  book.  It  is  not  the  intention  to 
go  at  all  into  the  details  as  to  how  life  tables  are  constructed,  for  two 
reasons:  In  the  first  place,  there  is  an  extensive  and  easily  available 
literature  on  the  subject.  In  the  second  place,  the  details  of 
actuarial  science  are  not  likely  to  be  of  immediate  interest  or  use 
to  the  medical  man. 

THE  ANATOMY  OF  A LIFE  TABLE 

Suppose  one  could  so  arrange  affairs  that  100,000  babies  would 
be  born  all  at  the  same  identical  instant  of  time,  and  in  such  cir- 
cumstances that  each  one  could  be  observed  then  and  subsequently 
without  break  of  continuity  in  the  observations  until  the  very  last 
one  had  died  as  a centenarian.  If  a record  were  kept  of  the  course 
of  events,  something  like  this  would  be  bound  to  emerge.  Some 
of  the  100,000  babies  would  die  in  the  first  day  after  birth.  Let  us 
say  there  were  observed  to  be  di  of  these.  Then  on  the  morning 
of  the  second  day  there  would  be  surviving  out  of  the  original 
100,000  who  started  life  together  the  day  before  only 

l\  — 100,000  — ■ d\. 
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It  is  perceived  that  when  this  experiment  started  there  were 
exposed  to  risk  of  dying  within  the  first  day,  or,  in  other  words,  within 
the  first  twenty-four  hours  after  birth,  100,000  individuals.  Within 
this  time  period  there  actually  died  d\  individuals.  Therefore  it 
follows  from  the  principles  laid  down  in  the  last  chapter  that  the 
specific  death-rate  in  this  first  day,  provided  we  consider  a day  as  a 
not  further  divisible  unit  or  instant  of  time,  which  is  to  say  that 
we  consider  the  whole  100,000  to  be  exposed  to  risk  over  the  whole 
day,* 

= dl 
qi  100,000' 

But  both  our  observations  and  the  babies  are  continuing.  In 
the  second  day  d2  individuals  were  observed  to  die.  Hence  on  the 
morning  of  the  third  day  there  were  surviving 

h = (100,000  - dl)  - d2 

and  the  death-rate  during  the  second  day  was,  on  the  same  assump- 
tions as  before, 

d2 

?2  ” (100,000  - dl) 

We  have  postulated  that  these  observations  are  to  be  carried 
on  without  break  until  the  last  one  of  the  original  group  has  passed 
away.  If  so,  the  bookkeeping  at  the  end  of  the  process  will  at  least 
contain  columns  as  follows: 


X 

(Age,  in  days,  months, 
years,  or  whatever 
units  one  pleases,  but 
best  stated  as  an  in- 
terval.) 

^ X 

(The  number  dying  with- 
in the  age  interval 
stated  in  the  x column.) 

A 

(The  number  surviving 
at  the  beginning  of  the 
age  interval  stated  in 
the  x column.) 

Qx 

(The  rate  dx/lx  J.  e.,  the 

number  dying  in  the  age 
interval  given  in  the  x 
column  divided  by  the 
number  of  survivors  at 
the  beginning  of  that 
interval.) 

0-1 

100,000 

1-2 

etc. 

* This  assumption  is,  of  course,  of  an  arbitrary  character.  Actually  the  exposed 
to  risk  over  the  whole  day  is  the  integration  of  the  number  exposed  to  risk  at  each 
infinitesimal  instant  of  time  in  the  whole  day.  But  what  is  here  attempted  is  only  to 
give  the  medical  reader  an  understanding  of  the  gross  anatomy  of  a life  table.  If 
he  wants  a knowledge  of  the  microscopic  anatomy  he  must  get  a text  which  treats  of 
that  subject.  References  to  such  are  given  at  the  end  of  the  chapter. 
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This  is  the  skeleton  of  a life  table.  To  this  skeleton  there  are 
sometimes  added  certain  other  functions  derived  from  these  three, 
dx,  lx,  and  qx.  For  the  vital  statistician  two  of  these  functions 
only  are  of  particular  interest  and  importance.  The  first  of  these 
is  what  is  called  the  “expectation  of  life,”  but  in  the  interest  of 
accuracy  should  always  be  called  the  “mean  after  lifetime.”  It  is 
designated  as  e symbolically.  It  gives  the  number  of  years  which 
will,  on  the  average,  be  subsequently  lived  by  each  person  who  has 
attained  any  stated  age.  The  expectation  of  life  at  birth  is  the 
average  age  at  death  of  all  the  100,000  who  start  life  together. 
But  it  should  always  be  kept  in  mind  that  the  average  age  at 
death  of  persons  in  the  general  population  does  not  usually  give 
the  expectation  of  life  at  birth  of  the  same  people.  This  would 
only  be  true  if  the  age  distribution  of  the  living  population  were 
identical  with  that  of  the  stable  life  table  population  Lx.  Further- 
more, the  mean  age  at  death  of  one  population  is  not  comparable 
with  the  same  constant  from  another  population,  unless  the  two 
populations  have  identical  age  distributions  of  the  living.  This 
fact  was  first  pointed  out  by  Farr  many  years  ago. 

The  second  important  derived  constant  of  a life  table  is  Lx, 
which  gives,  by  age  groups,  the  stationary  living  population,  un- 
affected by  emigration  and  immigration,  which,  assuming  the 
mortality  rates  given  by  qx , would  result  if  100,000  persons  were 
born  alive  uniformly  throughout  each  year.  One  important  use 
of  this  figure  will  appear  in  a later  chapter. 

HUMAN  LIFE  TABLES 

In  order  that  the  reader  may  have  a still  more  concrete  realiza- 
tion of  what  a life  table  looks  like,  Table  24  and  Figs.  60,  61,  and  62 
are  inserted.  The  table  is  that  portion  of  Glover’s1  life  table  for 
both  sexes  in  the  original  registration  states  in  1910,  which  carries 
the  constants  in  which  we  are  here  interested. 
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TABLE  24 


Life  Table  for  Both  Sexes  in  the  Original  Registration  States,  1910. 

(Glover’s  Table  2.) 


Age 

interval. 

Of  100,000  persons  born 
alive: 

Rate  of 
mortality 
per 

thousand. 

Complete 
expectation 
of  life. 

Stationary 

population.* 

Population  in 
current  age 
interval. 

Period  of 
lifetime 
between 
two  exact 
ages. 

Number  alive 
at  beginning  of 
age  interval. 

Number  dying 
in  age  interval. 

Number  dying 
in  age  interval 
among  1000 
alive  at  begin- 
ning of  age 
interval. 

Average  length 
of  life  remaining 
to  each  one  alive 
at  beginning  of 
age  interval. 

Including  only 
those  in  current 
month  or  year 
of  age. 

x to  * + l 

lx 

dx 

1000qx 

O 

ex 

Lx 

1 

2 

3 

4 

5 

6 

INFANT  MORTALITY— 

FIRST  YEAR  OF  LIFE  BY  AGE 

INTERVALS  OF 

ONE  MONTH 

Months. 

Monthly  rate. 

In  years. 

0-1 

100,000 

4377 

43.77 

51 

.49 

8,060 

1-2 

95,623 

1131 

11.83 

53 

.76 

7,921 

2-3 

94,492 

943 

9.98 

54 

.32 

7,835 

3-4 

93,549 

801 

8.57 

54 

.78 

7,762 

4-5 

92,748 

705 

7.60 

55 

.17 

7,700 

5-6 

92,043 

635 

6.90 

55 

.51 

7,644 

6-7 

91,408 

579 

6.33 

55 

.81 

7,593 

7-8 

90,829 

533 

5.87 

56 

.08 

7,547 

8-9 

90,296 

492 

5.45 

56 

.33 

7,504 

9-10...  . 

89,804 

456 

5.08 

56 

.56 

7,465 

10-11...  . 

89,348 

421 

4.72 

56 

.76 

7,428 

11-12...  . 

88,927 

389 

4.38 

56 

.95 

7,394 

LIFE 

TABLE  FOR  WHOLE  RANGE  OF  LIFE  BY  AGE 

INTERVALS  OF 

ONE  YEAR 

Years. 

Annual  rate. 

In  years. 

0-1 

100,000 

11,462 

114.62 

51 

.49 

91,853 

1-2 

88,538 

2,446 

27.62 

57 

.11 

87,095 

2-3 

86,092 

1,062 

12.34 

57 

72 

85,529 

3-4 

85,030 

666 

7.83 

57 

44 

84,683 

4-5 

84,364 

477 

5.65 

56 

89 

84,116 

5-6 

83,887 

390 

4.66 

56 

21 

83,692 

6-7 

83,497 

327 

3.91 

55 

47 

83,333 

7-8 

83,170 

274 

3.30 

54 

69 

83,033 

8-9 

82,896 

234 

2.82 

53 

87 

82,779 

9-10...  . 

82,662 

204 

2.47 

53 

02 

82,560 

10-11... . 

82,458 

187 

2.27 

52 

15 

82,365 

11-12... . 

82,271 

180 

2.19 

51 

26 

82,181 

12-13... . 

82,091 

182 

2.22 

50 

37 

82,000 

13-14...  . 

81,909 

193 

2.36 

49 

49 

81,812 

14-15... . 

81,716 

210 

2.57 

48 

60 

81,611 

15-16...  . 

81,506 

232 

2.84 

47 

73 

81,390 

16-17...  . 

81,274 

256 

3.16 

46 

86 

81,146 

17-18...  . 

81,018 

285 

3.52 

46 

01 

80,875 

18-19...  . 

80,733 

315 

3.89 

45 

17 

80,576 

19-20. . . . 

80,418 

344 

4.28 

44 

34 

80,246 

20-21...  . 

80,074 

375 

4.68 

43 

53 

79,887 

21-22... . 

79,699 

398 

5.00 

42 

73 

79,500 

22-23... . 

79,301 

412 

5.19 

41 

94 

79,095 

23-24. . . . 

78,889 

418 

5.29 

41 

16 

78,680 

24-25... . 

78,471 

425 

5.42 

40 

38 

78,259 

* Unaffected  by  emigration  and  immigration,  which,  assuming  the  mortality  rates  in  column  4,  would 
result  if  100,000  persons  were  born  alive  uniformly  throughout  each  year. 

16 


242 


MEDICAL  BIOMETRY  AND  STATISTICS 


TABLE  24 — Continued 


Age 

interval. 

Of  100,000  persons  born 
alive: 

Rate  of 
mortality 
per 

thousand. 

Complete 
expectation 
of  life. 

Stationary 

population.* 

Population  in 
current  age 
interval. 

Period  of 
lifetime 
between 
two  exact 
ages. 

Number  alive 
at  beginning  of 
age  interval. 

Number  dying 
in  age  interval. 

Number  dying 
in  age  interval 
among  1000 
alive  at  begin- 
ning of  age 
interval. 

Average  length 
of  life  remaining 
to  each  one  alive 
at  beginning  of 
age  interval. 

Including  only 
those  in  current 
month  or  year 
of  age. 

x to  tf  + l 

dx 

O 

O 

o 

H 

O 

Lx 

1 

2 

3 

4 

5 

6 

LIFE  TABLE  FOR  WHOLE  RANGE  OF  LIFE  BY  AGE 

INTERVALS  OF  ONE  YEAR 

Years. 

Annua! 

rate. 

In  years. 

25-26... . 

78,046 

432 

5. 

54 

39.60 

77,830 

26-27... . 

77,614 

440 

5. 

67 

38.81 

77,394 

27-28... . 

77,174 

451 

5. 

85 

38.03 

76,949 

28-29... . 

76,723 

465 

6. 

06 

37.25 

76,491 

29-30. . . . 

76,258 

479 

6. 

28 

36.48 

76,019 

30-31... . 

75,779 

493 

6 

51 

35.70 

75,532 

31-32... . 

75,286 

511 

6 

78 

34.93 

75,030 

32-33... . 

74,775 

530 

7 

09 

34.17 

74,510 

33-34. . . . 

74,245 

550 

7 

40 

33.41 

73,970 

34-35... . 

73,695 

568 

7 

72 

32.66 

73,411 

35-36...  . 

73,127 

588 

8 

04 

31.90 

72,833 

36-37... . 

72,539 

605 

8 

33 

31.16 

72,237 

37-38... . 

71,934 

617 

8 

59 

30.42 

71,626 

38-39. . . . 

71,317 

631 

8 

84 

29.68 

71,001 

39-40. . . . 

70,686 

644 

9 

11 

28.94 

70,364 

40-41... . 

70,042 

658 

9 

39 

28.20 

69,713 

41-42... . 

69,384 

674 

9 

72 

27.46 

69,047 

42-43... . 

68,710 

693 

10 

09 

26.73 

68,364 

43-44. . . . 

68,017 

716 

10 

52 

25.99 

67,659 

44-45.., . 

67,301 

740 

10 

99 

25.26 

66,931 

45-46. . . . 

66,561 

766 

11 

52 

24.54 

66,178 

46-47... . 

65,795 

795 

12 

08 

23.82 

65,397 

47-48. . . . 

65,000 

821 

12 

63 

23.10 

64,589 

48-49... . 

64,179 

846 

13 

18 

22.39 

63,756 

49-50. . . . 

63,333 

873 

13 

77 

21.69 

62,897 

50-51... . 

62,460 

897 

14 

37 

20.98 

62,012 

51-52... . 

61,563 

929 

15 

08 

20.28 

61,098 

52-  53... . 

60,634 

970 

16 

01 

19.58 

60,149 

53-54... . 

59,664 

1025 

17 

17 

18.89 

59,151 

54-55... . 

58,639 

1084 

18 

49 

18.21 

58,097 

55-56... . 

57,555 

1153 

20 

03 

17.55 

56,978 

56-57... . 

56,402 

1225 

21 

72 

16.90 

55,790 

57-58... . 

55,177 

1289 

23 

37 

16.26 

54,532 

58-59...  . 

53,888 

1346 

24 

97 

15.64 

53,215 

59-60. . . . 

52,542 

1404 

26 

.73 

15.03 

51,840 

60-61... . 

51,138 

1462 

28 

58 

14.42 

50,407 

61-62... . 

49,676 

1521 

30 

62 

13.83 

48,915 

62-63... . 

48,155 

1587 

32 

96 

13.26 

47,361 

63-64. . . . 

46,568 

1656 

35 

55 

12.69 

45,740 

64-65... . 

44,912 

1718 

38 

25 

12.14 

44,053 

65-66. . . . 

43,194 

1773 

41 

06 

11.60 

42,308 

66-67... . 

41,421 

1826 

44 

08 

11.08 

40,508 

67-68... . 

39,595 

1877 

47 

41 

10.57 

38,657 

68-69 

37,718 

1928 

51 

12 

10.07 

36,754 

69-70. . . . 

35,790 

1974 

55 

14 

9.58 

34,803 

* Unaffected  by  emigration  and  immigration,  which,  assuming  the  mortality  rates  in  column  4,  would 
result  if  100,000  persons  were  born  alive  uniformly  throughout  each  year. 
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TABLE  24- — Concluded 


Age 

interval. 

Of  100,000  persons  born 
alive: 

Rate  of 
mortality 
per 

thousand. 

Complete 
expectation 
of  life. 

Stationary 

population.* 

Population  in 
current  age 
interval. 

Period  of 
lifetime 
between 
two  exact 
ages. 

Number  alive 
at  beginning  of 
age  interval. 

Number  dying 
in  age  interval. 

Number  dying 
in  age  interval 
among  1000 
alive  at  begin- 
ning of  age 
interval. 

Average  length 
of  life  remaining 
to  each  one  alive 
at  beginning  of 
age  interval. 

Including  only 
those  in  current 
month  or  year 
of  age. 

x to  x + 1 

^ X 

O 

O 

O 

T—H 

O 

ex 

L* 

1 

2 

3 

4 

5 

6 

LIFE  TABLE  FOR  WHOLE  RANGE  OF  LIFE  BY  AGE  INTERVALS  OF  ONE  YEAR 


Years. 
70-71...  . 

33,816 

2013 

Annual  rate. 
59.52 

In  years. 

9.11 

32,810 

71-72... . 

31,803 

2044 

64.29 

8.66 

30,781 

72-73... . 

29,759 

2065 

69.38 

8.22 

28,726 

73-74... . 

27,694 

2072 

74.82 

7.79 

26,658 

74-75...  . 

25,622 

2070 

80.78 

7.38 

24,587 

75-76... . 

23,552 

2057 

87.37 

6.99 

22,523 

76-77... . 

21,495 

2028 

94.35 

6.61 

20,481 

77-78... . 

19,467 

1981 

101.74 

6.25 

18,476 

78-79... . 

17,486 

1920 

109.78 

5.90 

16,526 

79-80. . . . 

15,566 

1854 

119.10 

5.56 

14,639 

80-81... . 

13,712 

1786 

130.28 

5.25 

12,819 

81-82... . 

11,926 

1696 

142.17 

4.96 

11,078 

82-83... . 

10,230 

1565 

153.06 

4.70 

9,448 

83-84. . . . 

8,665 

1409 

162.58 

4.45 

7,960 

84-85... . 

7,256 

1255 

172.97 

4.22 

6,628 

85-86... . 

6,001 

1103 

183.80 

4.00 

5,449 

86-87...  . 

4,898 

954 

194.85 

3.79 

4,421 

87-88... . 

3,944 

816 

206.84 

3.58 

3,536 

88-89... . 

3,128 

689 

220.13 

3.39 

2,784 

89-90. . . . 

2,439 

571 

234.31 

3.20 

2,154 

90-91... . 

1,868 

466 

249.62 

3.03 

1,635 

91-92... . 

1,402 

371 

264.66 

2.87 

1.216 

92-93... . 

1,031 

289 

279.90 

2.73 

886 

93-94. . . . 

742 

219 

295.12 

2.59 

633 

94-95... . 

523 

162 

310.17 

2.47 

442 

95-96. . . . 

361 

117 

325.02 

2.35 

302 

96-97... . 

244 

83 

339.74 

2.24 

202 

97-98. . . . 

161 

57 

354.55 

2.14 

132 

98-99 

104 

39 

369.73 

2.04 

85 

99-100... 

65 

25 

385.46 

1.95 

53 

100-101... 

40 

16 

401.91 

1.85 

32 

101-102... 

24 

10 

419.14 

1.76 

19 

102-103.. 

14 

6 

437.37 

1.67 

11 

103-104.. 

8 

4 

456.77 

1.59 

6 

104-105... 

4 

2 

477.48 

1.50 

3 

105-106.. 

2 

1 

500.22 

1.41 

2 

106-107... 

1 

1 

524.82 

1.33 

1 

* Unaffected  by  emigration  and  immigration,  which,  assuming  the  mortality  rates  in  column  4,  would 
result  if  100,000  persons  were  born  alive  uniformly  throughout  each  year. 
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The  following  diagrams  illustrate  the  important  functions  of  a 
life  table.  The  first  (Fig.  60)  shows  the  form  of  the  life  table 

A«E  IN  YEARS 

O 10  20  30  AO  50  60  70  SO  SO  IOO 


Fig.  60. — Annual  mortality  rate  per  thousand.  The  original  registration  states,  both 

sexes,  1910  (from  Glover,1  p.  243). 


specific  death-rate  curve  (qx),  being  the  plot  of  this  column  of 
Table  24  above. 

The  next  diagram  (Fig.  61)  shows  the  form  of  the  lx  curve. 
Here  the  data  for  a number  of  different  countries  are  included. 
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The  picture  shows  in  a striking  way  the  usefulness  of  the  life  table 
method  in  the  comparative  study  of  mortality. 


Fig.  61. — Number  of  survivors  out  of  100,000  born  alive.  Australia,  England, 
Germany,  India,  Italy,  Sweden,  and  whites  in  the  original  registration  states.  Males, 
1901-10  (from  Glover,1  p.  260). 


The  next  diagram  (Fig.  62)  shows  the  form  of  the  dx  curve,  and 
again  the  life  tables  of  several  countries  are  drawn  upon  for  com- 
parison. 
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Fig.  62.— Number  of  deaths  out  of  100,000  bom  alive.  Australia,  England, 
Germany,  India,  Italy,  Sweden,  and  whites  in  the  original  registration  states.  Males 
1901-10  (from  Glover,1  p.  270). 


A LIFE  TABLE  NOMOGRAM 

As  a further  help  toward  understanding  the  structure  and  mean- 
ing of  life  tables  a nomogram  devised  by  Pearl  and  Reed4  may  be 
presented  here. 

If  dx  be  used,  as  in  what  has  preceded,  to  indicate  the  deaths 
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at  age  x,  or,  more  correctly,  the  number  of  persons  dying  in  the 
interval  from  x to  x + 1,  where  1 denotes  one  unit  of  age,  which 
theoretically  can  be  made  as  small  as  one  likes,  then 

CO 

lx  — dx  + dx  + 1 T dx  + 2 -j-  • • * T dco  = 2 dx, 

x 

denotes  the  number  of  survivors  at  age  x,  and  we  may  define  a 
quantity 

T _ lx  H-  lx  + 1 
Lx  - 2 ’ 

which  will  be  the  number  living  between  any  two  ages  x and  x + 1. 

The  instantaneous  death  rate,  or  the  probability  of  dying  in  the 
age  interval  x to  a + 1 is 

dx 

qx  — I • 
lx 

Let  us  define  another  quantity  as 

CO 

Tx  = Lx  T L%  + 1 T Lx  + 2 T • • • -f-  Leo  — 2 Lx , 

X 

which  gives  the  population  in  current  and  all  older  age  intervals. 

Then  the  expectation  of  life,  or  mean  after-life  time  at  age  x 
will  be 

_ Tx 

Cx  — ~j — • 
lx 

The  reciprocal  of  this  last  quantity,  or  lx/Tx  will  give  the  death- 
rate  at  age  x and  over. 

Finally  the  number  living  at  age  x per  death  at  that  age  will 
be  the  reciprocal  of  qx  or  lx/dx. 

Consider  now  Fig.  63.  This  is  a diagram*  in  two  parts,  plotted 
on  an  arithlog  grid,  namely,  one  in  which  the  abscissal  divisions  are 
arithmetically  equal,  and  represent  ages,  and  the  ordinate  divisions 
are  proportional  to  the  logarithms  of  the  numbers  set  down  by 
their  side.  On  the  left  hand,  or  larger  one  of  the  two  diagrams,  in 
which  the  logarithmic  grid  extends  to  5 decks,  there  are  plotted  3 

* On  account  of  the  size  of  the  page  the  original  diagram  is  much  reduced  here. 
This,  however,  is  unimportant  because  the  only  purpose  of  Fig.  63  is  to  show  how  the 
nomogram  is  constructed.  The  student  should  draw  this  nomogram  for  himself  on  a 
large  sheet  of  5-deck  arithlog  paper. 
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lines,  namely  dx,  lx,  and  Tx  as  defined  above,  for  Glover’s1  life 
table  for  males  in  the  Original  Registration  States  in  1910.  Let 
us  for  convenience  call  this  larger  diagram  the  nomogram  base . 
The  dXj  lx  and  Tx  data  were  as  a matter  of  fact  in  this  case  taken 
directly  from  Glover’s  table.  But  the  data  for  plotting  Tx  could 
equally  well  have  been  got  by  accumulating  by  successive  additions 
the  lx  values  of  Glover’s  table,  beginning  with  la  and  adding  back- 
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Fig.  63. — A life  table  nomogram.  For  explanation  see  text. 


ward,  i.  e.,  toward  the  beginning  of  life.  It  is  important  to  note 
this  point  because  many  life  tables  do  not  table  Tx.  The  right 
hand,  or  smaller  one  of  the  diagrams,  which  we  may  for  convenience 
call  the  nomogram  scale , is  simply  the  same  logarithmic  scale  as  that 
of  the  larger  of  the  two  diagrams,  but  extending  only  from  1 to  1000. 

Now  consider  the  nomogram  scale  of  Fig.  63  to  be  cut  free  from 
the  rest  of  the  sheet  and  therefore  freely  movable. 
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We  then  have  the  following  rules: 

Rule  I.  To  find  the  instantaneous  death-rate  qx.  Place  the 
nomogram  scale  on  the  nomogram  base  in  such  a manner  that  A 
is  on  the  lx  line  at  the  age  x.  Then  read  the  nomogram  scale  at 
the  point  where,  at  age  x,  if  is  cut  by  the  dx  line  of  the  nomogram 
base.  The  value  so  read  will  be  qx,  the  death-rate  per  1000  living. 

Rule  II.  To  find  the  death-rate  at  age  x and  over.  Place  the 
nomogram  scale  on  the  nomogram  base  in  such  a manner  that  A 
coincides  with  the  Tx  line  at  age  x.  Then  read  the  nomogram  scale 
at  the  point  where,  at  age  x,  it  is  cut  by  the  lx  line  of  the  nomogram 
base.  The  value  so  read  will  be  the  death  rate  per  1000  living  at 
age  x and  over. 

Rule  III.  To  find  the  expectation  of  life  ex.  Place  the  nomo- 
gram scale  on  the  nomogram  base  in  such  a manner  that  B coincides 
with  the  lx  line,  at  age  x.  Then  read  the  nomogram  scale  at  the 
point  where,  at  age  x,  it  is  cut  by  the  Tx  line  of  the  nomogram  base. 
The  value  so  read  will  be  the  expectation  of  life  at  age  x,  in  years. 

Rule  IV.  To  find  the  number  of  living  persons  of  age  x per 
death  at  that  age.  Place  the  nomogram  scale  on  the  nomogram 
base  in  such  a manner  that  B coincides  with  the  dx  line,  at  age  x. 
Then  read  the  nomogram  scale  at  the  point  where,  at  age  x,  it  is 
cut  by  the  lx  line  of  the  nomogram  base. 

It  will  have  been  noted  that  the  nomogram  scale  is  identical 
with  the  scale  of  the  nomogram  base  throughout  the  range  of  the 
former,  namely,  from  1 to  1000.  This  fact  indicates  at  once  that 
the  use  of  the  nomogram  scale  may  be  replaced  by  an  ordinary  pair 
of  dividers.  The  rules  then  take  the  following  form. 

Label  one  point  or  leg  of  the  dividers  A and  the  other  B. 

Rule  I bis.  Place  leg  A on  the  lx  line  at  age  x.  Bring  leg  B to 
coincide  with  the  dx  line  at  age  x.  Lift  the  dividers  and  place  leg  A 
at  1000  on  the  nomogram  base  scale.  Then  read  qx  at  the  point 
below  where  leg  B touches  the  same  scale. 

Rule  II  bis.  Place  leg  A on  the  Tx  line  at  age  x and  bring  leg  B 
to  coincide  with  the  lx  line  at  age  x.  Lift  the  dividers  and  place 
leg  A at  1000  on  the  nomogram  base  scale.  PTen  read  the  death 
rate  at  age  x and  over  on  the  point  below  1000  where  leg  B touches 
the  same  scale. 
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Rule  III  bis.  Proceed  as  in  Rule  II  bis,  but  after  lifting  the 
dividers  place  leg  A at  1 on  the  nomogram  base  scale  and  read 
expectation  of  life  at  the  point  above  1 where  leg  B touches  the 
same  scale. 

Rule  IV  bis.  Proceed  as  in  Rule  I bis,  but  after  lifting  the 
dividers  place  leg  A at  1 on  the  nomogram  base  scale  and  read 
number  of  living  persons  of  age  x per  death  at  that  age  at  the  point 
above  1 where  leg  B touches  the  same  scale. 

The  proof  of  the  above  rules  is  evident  from  the  equations. 
All  that  this  nomographic  treatment  essentially  does  is  to  take 
advantage  of  the  property  of  logarithms  which  enables  division 
to  be  accomplished  by  a process  of  subtraction.  The  subtracting 
of  the  logarithms  is  done  geometrically.  Simple  as  the  idea  in- 
volved is  its  very  useful  application  to  the  functions  of  a life  table 
has  apparently  not  hitherto  been  systematically  made.  J.  A. 
Field  (loc.  cit.,  Chap.  VI)  pointed  out  that  when  lx  is  plotted  on 
arithlog  paper  the  slope  of  the  tangent  at  any  point  of  the  curve  is  qx. 
This  principle  has  been  made  use  of  in  certain  cases  in  the  graphic 
presentation  of  life  curves.  But  a trial  convinces  one  at  once 
that  only  a very  rough  approximation  to  qx  may  be  obtained  in 
this  way. 

There  are,  of  course,  a number  of  useful  corollaries  of  the  four 
rules  given  above,  which  will  occur  to  the  student.  We  shall  men- 
tion only  one  here,  by  way  of  numerical  illustration  of  the  kind  of 
service  which  this  life  table  nomogram  may  render.  As  the  world 
approaches  more  and  more  closely  to  a condition  of  population 
saturation,  the  populations  of  the  various  demographic  units  will 
obviously  come  nearer  and  nearer  to  the  conditions  of  life  table 
stability.  This  is  a condition  in  which  births  bear  a fixed  and 
constant  relation  of  equality  to  deaths  and  there  is  no  alteration 
of  the  situation  by  migration.  That  we  are  now  somewhat  defi- 
nitely approaching  such  a condition  in  most  highly  industrialized 
and  civilized  countries  is  evidenced  by  the  concomitant  decline  of 
birth-  and  death-rates,  and  the  closer  and  closer  approach  of  both 
these  rates  to  each  other.  The  location  of  the  exact  levels  at  which 
these  rates  will  stabilize  presents  an  interesting  problem.  It  has 
been  suggested  that  a stable  death-rate  of  the  order  of  7 or  even  5 
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per  thousand  is  quite  within  the  range  of  possibility;  is  in  fact 
almost  certain  to  be  attained  in  comparatively  few  years. 

Now,  by  Rule  II,  if  (a)  1000  on  the  nomogram  scale  is  placed,  at 
age  0 (birth),  on  the  nomogram  base  so  to  coincide  with  the  Tx 
line  at  that  age,  and  (b)  the  point  on  the  age  0 ordinate  of  the  nomo- 
gram base  corresponding  to  the  nomogram  scale  graduation  7 is 
marked,  and  finally  (c)  the  nomogram  scale  is  then  moved  so  that 
B coincides  with  the  point  just  marked  on  the  nomogram  base,  it  is 
then  found  that  the  line  Tx  cuts  the  nomogram  scale  at  approxi- 
mately 143.  This  means  that  in  order  to  have  a stabilized  death- 
rate  of  7 per  thousand  in  a population  unaffected  by  migration, 
the  mean  or  average  duration  of  life  (expectation  of  life  at  birth) 
would  have  to  be  approximately  143  years!  Under  the  same  con- 
ditions a death-rate  stabilized  at  5 would  mean  an  expectation  of 
life  at  birth  of  exactly  200  years!  A death-rate  stabilized  at  10, 
under  the  above  restriction,  means  an  expectation  of  life  at  birth 
(mean  after  life  time)  of  exactly  100  years!  Of  course  such  death- 
rates  as  these  of  which  we  have  been  speaking  are  only  attained 
under  one  or  the  other  or  a combination  of  three  conditions:  (a) 
a constantly  increasing  rate  of  growth  of  the  population  by  an 
ever-increasing  birth  rate,  or  (b)  by  immigration  into  the  population 
of  persons  of  those  ages  where  the  age  specific  death-rates  are  low 
and  a concomitant  or  subsequent  migration  out  of  the  population 
of  persons  at  advanced  ages,  where  specific  death  rates  are  high, 
to  die  elsewhere,  or  (c)  a combination  of  an  immigration  of  persons 
of  favorable  ages  and  an  increasing  birth-rate  sufficient  always  to 
offset  the  necessity  of  old  persons  emigrating  to  die  elsewhere. 
But  no  one  of  these  conditions  is  compatible,  by  definition,  with  a 
stable  population  or  a stable  death-rate  in  the  sense  of  a life  table. 

The  usefulness  of  this  nomogram  in  experimental  work  on 
duration  of  life,  as  for  example  the  investigations  on  Drosophila 
discussed  in  the  next  section,  is  great.  In  such  experimental  work 
dx  (in  the  true  actuarial  sense)  is  directly  observed,  and  can  be  put 
upon  a per  thousand  base  and  directly  plotted.  From  this  lx, 
and  in  turn  Tx,  can  be  plotted.  Then  by  the  aid  of  this  nomographic 
method  all  the  important  functions  of  the  life  table  can  be  read 
directly,  without  the  necessity  of  any  computation  whatsoever. 
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The  case  can  never  be  quite  so  simple  for  human  life  tables,  be- 
cause there  we  cannot  observe  the  true  life  table  dx  line  directly. 
It  must  be  computed  from  the  statistical  data. 

LIFE  TABLES  FOR  LOWER  ORGANISMS 

Life  tables  can  and  should  be  computed  for  other  forms  of  life 
besides  man.  Their  importance  for  the  study  of  organic  evolution 
can  scarcely  be  overestimated.  Owing  to  the  general  lack  in 
biologic  literature,  however,  of  the  basic  observational  data 
necessary  for  the  construction  of  a life  table,  only  the  merest 
beginning  has  been  made  in  this  direction. 

An  example  of  a complete  life  table  for  another  organism,  the 
fruit-fly,  Drosophila  melanogaster,  is  given  in  Tables  25  and  26,  and 
Fig.  64.  These  life  tables  were  worked  out  in  the  author’s  labora- 
tory.5’ 6 Only  two  Drosophila  life  tables  are  given  here.  Similar 
tables  for  females  will  be  found  in  the  original  publication.  The  lx 
curves  in  the  diagram  show  the  similarity  of  the  findings  to  those 
in  man,  remembering  that  the  fly  curves  are  plotted  on  an  arithlog 
grid  and  that  they  have  no  infant  mortality  component. 

The  “vestigial’’  (Table  26)  is  a mutant  form  of  Drosophila , 
characterized  by  minute,  functionless  wings,  and  a shorter  life  span 
than  the  normal  form  (Table  25). 

An  interesting  problem  now  presents  itself.  How  shall  one 
compare  the  mortality  of  two  organisms  whose  total  life  spans  are 
so  widely  different  in  extent  of  time  that  it  is  in  practice  quite 
impossible  to  measure  or  express  them  in  the  same  unit? 

Various  methods  have  been  used  for  making  this  comparison. 
The  one  originally  used  by  the  author5  has  been  criticised  by 
Greenwood,8  whose  valuable  discussion  of  the  whole  problem 
should  be  read.  The  method  which  appears  to  be  the  most 
valid  and  least  open  to  statistical  objections  is  one  which  is  a par- 
ticular application  of  a general  method  devised7  for  the  purpose  of 
comparing  the  relative  variability  of  different  organisms.*  In  the 
present  case  the  mode  of  application  of  this  principle  is  to  regard 
the  observed  mean  age  at  death  (duration  of  life)  of  the  individuals 

* See  the  section  on  Graphic  Representation  of  Relative  Variability  in  Chapter 
XIII  of  this  book. 
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TABLE  25 


Life  Table  for 

Drosophila— 

-Wild  Type. 

Line  107- 

—Males 

Age  in 

Age  in 

days. 

h 

% 

% 

days. 

** 

Qx 

1 

.........  1000 

0.2 

45.8 

46 

.....  551 

43.5 

12.3 

2.  . . . 

1000 

0.6 

44.8 

47 

. . . . . 527 

46.6 

11.8 

3 

999 

1.0 

43.8 

48 

502 

49.8 

11.3 

4.  . . . 

998 

1.3 

42.9 

49 

477 

53.2 

10.9 

5.  . . . 

997 

1.7 

41.9 

50. ...... . 

452 

56.9 

10.4 

6.  . . . 

995 

2.0 

41.0 

51. ...... . 

426 

60.8 

10.0 

7.  . . . 

993 

2.4 

40.1 

52 

400 

65.0 

9.6 

8.  . . . 

991 

2.8 

39.2 

53 

374 

69.5 

9.2 

9 

988 

3.2 

38.3 

54. 

348 

74.2 

8.8 

10. . . . 

985 

3.5 

37.4 

55. ...... . 

322 

79.2 

8.4 

11.  . . . 

. . . 981 

3.9 

36.5 

56. ...... . 

297 

84.5 

8.0 

12.  . . . 

978 

4.3 

35.7 

57 

272 

90.2 

7.7 

13.  . . . 

973 

4.7 

34.8 

58 

247 

96.2 

7.4 

14.  . . . 

.........  969 

5.1 

34.0 

59 

223 

102.5 

7.0 

15.  . . . 

. . 964 

5.5 

33.2 

60 . 

200 

109.2 

6.7 

16. . . . 

958 

6.0 

32.3 

61.  ...... . 

179 

116.3 

6.4 

17.  . . . 

953 

6.4 

31.5 

62 

158 

123.8 

6.1 

18. . . . 

947 

6.9 

30.7 

63 

. . . . . 138 

131.7 

5.9 

19.  . . . 

940 

7.4 

29.9 

64. ...... . 

. . . . . 120 

139.9 

5.6 

20. . . . 

933 

7.9 

29.2 

65 

103 

148.8 

5.3 

21.  . . . 

926 

8.5 

28.4 

66 

. . . . . 88 

157.9 

5.1 

22.  . . . 

918 

9.0 

27.6 

67........ 

. . . . . 74 

167.6 

4.9 

23.  . . . 

910 

9.6 

26.9 

68. 

62 

177.7 

4.7 

24 ...  . 

.........  901 

10.3 

26.1 

69. ......  . 

. . . . . 51 

188.3 

4.4 

25. . . . 

892 

10.9 

25.4 

70. ...... . 

. . . . . 41 

199.4 

4.2 

26. . . . 

882 

11.7 

24.6 

71 

33 

211.0 

4.1 

27.  . .. 

872 

12.4 

23.9 

72 

26 

223.1 

3.9 

28. . . . 

861 

13.3 

23.2 

73. 

20 

235.8 

3.7 

29.  . . . 

849 

14.1 

22.5 

74 

. . . . . 15 

248.9 

3.5 

30. . . . 

837 

15.1 

21.8 

75. ...... . 

12 

262.6 

3.4 

31.  . . 

825 

16.1 

21.1 

76 

. . . . . 9 

276.8 

3.2 

32 ...  . 

811 

17.2 

20.5 

77 

. . . . . 6 

291.5 

3.1 

33.  . . . 

798 

18.3 

19.8 

78. 

4 

306.7 

2.9 

34.  . . . 

. . 783 

19.5 

19.1 

79 

3 

322.5 

2.8 

35.  . . . 

768 

20.8 

18.5 

80. ...... . 

2 

338.7 

2.6 

36. . . . 

752 

22.3 

17.9 

81 . 

1 

355.5 

2.4 

37.  . . . 

735 

23.8 

17.3 

82 

1 

372.7 

2.2 

38. . . . 

717 

25.4 

16.7 

39. . . . 

699 

27.2 

16.1 

40. . . . 

680 

29.1 

15.5 

41 

660 

31.1 

14.9 

42 ...  . 

640 

33.3 

14.4 

43 ...  . 

619 

35.5 

13.8 

44 

597 

38.0 

13.3 

45.  . . . 

574 

40.7 

12.8 
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TABLE  26 

Life  Table  for  Drosophila — Vestigial — Males 


Age  in  Age  in 


days. 

Qx 

ex 

days. 

^x 

Qx 

** 

1.  .....  . 

1000 

0.0 

14.1 

26 

72 

162.7 

5.8 

2 

. ....  1000 

9.0 

13.1 

27 

61 

162.5 

5.7 

3 

991 

18.1 

12.2 

28 

51 

162.0 

5.6 

4.  .....  . 

. . . . . 973 

27  A 

11.7 

29 

43 

161.1 

5.5 

5.  

. . . . . 946 

36.7 

10.7 

30 

36 

160.8 

5.4 

6 

. . . . . 912 

45.8 

10.1 

31 

30 

160.7 

5.3 

7 

870 

55.4 

9.6 

32 

25 

161.5 

5.1 

8 

821 

64.7 

9.1 

33 

21 

163.6 

4.9 

9 

768 

73.8 

8.6 

34 

18 

167.7 

4.6 

10.  

. . . . . 712 

82.8 

8.2 

35 

15 

174.5 

4.4 

11 

. . . . . 653 

91.5 

7.9 

36 

12 

184.8 

4.1 

12 

. . . . . 593 

100.0 

7.6 

37 

10 

198.5 

3.8 

13 

534 

108.1 

7.3 

38 

. 8 

219.6 

3.4 

14 

476 

115.9 

7.0 

39 

6 

246.0 

3.1 

15 

421 

123.1 

6.8 

40 

5 

279.6 

2.8 

16 

369 

129.9 

6.7 

41 

3 

320.9 

2.5 

17 

321 

136.2 

6.5 

42 

2 

370.4 

2.3 

18 

277 

141.8 

6.4 

43 

1 

427.7 

2.0 

19 

238 

146.8 

6.3 

44 

1 

491.9 

1.7 

20 

203 

151.2 

6.2 

21.  

....  172 

154.8 

6.1 

22 

....  146 

157.8 

6.0 

23 

....  123 

160.0 

6.0 

24 

....  103 

161.5 

5.9 

25.  . 

....  86 

162.4 

5.9 

in  each  different  species  to  be  compared  as  representing  a biologically 
equivalent  point  in  each  of  their  several  life  cycles,  and  then  to 
represent  every  other  absolute  age  as  a percentage  deviation  from 
the  mean  duration  of  life  of  each  particular  species.  Thus,  for 
example,  if  the  mean  absolute  duration  of  life  for  a particular 
species  is  fifty  days  ( = 100  per  cent,  on  this  plan),  then  an  indi- 
vidual organism  of  that  species  dying  at  age  seventy-five  days  will 
be  recorded  as  having  a relative  duration  of  life  of  150  per  cent., 
because  it  lived  half  again  as  long  as  the  average  of  the  individuals 
of  the  species  to  which  it  belongs.  Furthermore,  as  is  usual  in  life 
table  work,  frequencies  are  put  upon  a relative  rather  than  an 
absolute  basis. 

In  Table  27  and  Fig.  65  are  shown  the  results  of  certain  com- 
parisons made  in  this  way  between  a few  widely  separated  organ- 
isms, in  respect  of  the  order  in  time  of  their  dying. 
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The  mean  absolute  durations  of  life  for  the  forms  shown  in 
Table  27  are  given  in  Table  28. 

From  Table  27  and  Fig.  65  certain  results  are  at  once  apparent: 
1.  The  life  curves  fall  clearly  into  three  groups.  The  first  of 
these,  Group  A,  approximates  to  the  rectangular  type  of  survivor- 


Fig.  64— Diagram  showing  the  observed  and  graduated  lx  points  for  (a)  line  107 
wild  type,  and  ( b ) vestigial  flies.  The  small  circles  are  the  observations,  and  the 
smooth  lines  the  fitted  curves  from  the  equations. 


ship  curve,  in  which  (in  the  limiting  case)  all  the  individuals  live  to 
a certain  age  and  then  die  together  at  the  same  instant.  This 
limiting  type  is  approached,  though  of  course  not  precisely  realized, 
in  the  present  material  by  ( a ) the  rotifer  Proales  and  (b)  the  fly 
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TABLE  27 

Survivorship  Distribution  (lx)  for  Ages  Expressed  as  Percentage  Deviations 

from  Mean  Duration  of  Life 


Percentage  devia- 
tion from  mean 
duration  of  life. 

Drosophila 
wild  (107). 
dd 

Drosophila 

vestigial. 

dd 

Drosophila 

starved. 

dd 

Proales  decip- 

iens. 

Hydra  fusca. 

i 

£ 

•»s» 

C5  . 

°° 

1 — i 
«« 

SQ 

Agriolimax. 

Mice  (Hill’s 

data). 

Automobiles. 

! 

- 100 

1000 

1000 

1000 

1000 

1000 

1000 

1000 

1000 

1000 

- 80 

988 

993 

991 

1000 

875 

994 

787 

988 

979 

- 60 

945 

924 

981 

996 

747 

977 

664 

961 

912 

- 40 

867 

795 

967 

967 

629 

908 

583 

901 

801 

- 20.  

741 

635 

914 

849 

526 

752 

519 

775 

654 

0 

556 

469 

537 

543 

440 

513 

454 

548 

488 

+ 20 

322 

323 

71 

147 

367 

258 

398 

247 

325 

+ 40 

118 

211 

5 

1 

302 

83 

330 

40 

191 

+ 60 

19 

132 

242 

15 

259 

1 

94 

+ 80 

1 

80 

182 

1 

190 

38 

+ 100 

49 

119 

129 

12 

+ 120 

30 

59 

81 

3 

+ 140 

18 

18 

.... 

46 

+ 160 

11 

3 

24 

+ 180 

5 

11 

+200 

2 

5 

+220.  

2 

+240 

1 

TABLE  28 

Mean  Duration  oe  Life  in  Absolute  Time  Units 


Hydra  fuscad 54.89  days 

Proales  decipiensb 5.95  days 

Agriolimax  agrestis c 4. 12  months 

Blatta  orientalisd 40.89  days 

Drosophila,  Wild  (107)  cP  cP 45.81  days 

Drosophila , Starved  cP  cP 44.09  hours 

Mouse6 636 . 50  days 

Automobiles7 7.04  years 


!l  Life  table  calculated  from  data  in  Hase,  A.,  Uber  die  deutschen  Stiss-wasser- 
Polypen  Hydra  fusca,  etc.  Arch.  f.  Rassen-u.  Gesellschaft-Biologie,  Bd.  6,  pp.  721— 
753,  1909.  ' 

b Pearl,  R.,  and  Doering,  C.  R.:  A Comparison  of  the  Mortality  of  Certain  Lower 
Organisms  with  that  of  Man,  Science,  N.  S.,  vol.  57,  pp.  209-212,  1923. 

c Life  table  calculated  from  data  in  Szabo,  I.,  and  Szabo,  M.:  Lebensdauer, 
Wachstum  und  Altern,  studiert  bei  der  Nacktschneckenart  Agriolimax  agrestis  L. 
Biologia  Generalis,  Bd.  5,  pp.  95-118,  1929. 

d Life  table  calculated  from  data  in  Rau,  P.:  The  Biology  of  the  Roach,  Blatta 
orientalis  Linn.  Trans.  Acad.  Sci.,  St.  Louis,  vol.  25,  pp.  57-79,  1924. 
e Life  table  calculated  from  Hill  data  given  by  Greenwood.8 
f Data  from  Griffin,  C.  E.:  The  Life  History  of  Automobiles,  Michigan  Business 
Studies,  vol.  i (University  of  Michigan),  1928,  p.  v + 42. 
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Drosophila  under  conditions  of  complete  starvation.  In  this  group 
the  characteristic  feature  of  the  mortality  is  that  there  is  no  death- 
rate  at  all  (or  a negligible  one)  until  the  upper  end  of  the  life  span 


-100  -)o  0 +50  +100  +1)0  +200 


Percentage  deviation  from  mean  duration  of  life 

Fig.  65. — Survivorship  lines  for  various  species  of  animals  and  the  automobile, 
on  a relative  time  base.  For  each  form  represented  the  mean  duration  of  life  is  taken 
as  100  per  cent,  on  the  abscissal  side,  and  all  other  ages  (time  durations)  are  expressed 
as  percentage  deviations  (plus  or  minus)  from  this  mean.  Further  explanation  in  text. 

17 
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is  nearly  reached.  Then  there  is  an  explosive  outbreak  of  mortality 
which  kills  all  the  individuals  within  a short  time  interval.  In  these 
cases  the  upper  end  of  the  life  span  stands,  in  terms  of  relative  age, 
to  the  mean  duration  of  life  as  roughly  140  : 100.  This  type  of  life 
curve  may  most  properly  be  designated  as  the  limit  of  negatively 
skew  life  curves,  because  the  dx  curve  which  gives  rise  to  this  type 
of  rectangular  lx  line  is  characterized  by  negative  skewness  (cf. 
Chap.  XIII,  Skewness). 

The  next  group  of  life  curves  is 

2.  Group  B.  The  intermediate  type  of  survivorship  curve. 
This  is  represented  in  the  present  material  by  normal  wild  (107) 
Drosophila , the  cockroach  ( Blatta  orienlalis),  and  the  mouse. 
The  common  characteristics  of  the  order  of  dying  out  of  these 
forms  are,  first,  that  the  death-rates  tend  to  increase  smoothly 
with  age  throughout  the  life  span,  and  at  a more  rapid  rate 
in  the  second  half  than  in  its  first  half.  This  would  seem  to  be 
the  characteristic  mode  of  wearing  out  of  non-living,  man-made 
machines,  exemplified  by  Griffin’s  automobile  life  curve  here  de- 
picted. In  the  second  place,  forms  showing  this  intermediate, 
“wearing-out”  type  of  order  of  dying  have  the  total  life  span,  in 
terms  of  relative  age,  standing  in  relation  to  the  mean  duration  of 
life,  as,  on  the  average,  roughly  185  : 100. 

We  come  next  to 

3.  Group  C.  The  diagonal  type  of  survivorship  curves,  with  a 
constant  death-rate  until  near  the  end  of  the  life  span  (in  the  theo- 
retical limiting  case).  In  the  present  material  the  fresh  water 
polyp  Hydra  fusca,  the  slug  Agriolimax  agrestis,  and  the  Drosophila 
mutant  vestigial  approach  this  type  of  order  of  dying  out.  These 
forms  have  as  common  characteristics,  first,  an  approach  to  a 
constant  death-rate  at  all  ages,  and,  second,  a very  wide  ratio  of 
total  life  span  to  mean  duration  of  life.  On  the  average,  for  these 
three  forms,  this  ratio  is  300  : 100,  or,  in  other  words,  some  indi- 
viduals live  three  times  as  long  as  the  average  of  the  population. 
If  man  had  this  characteristic  the  upper  limit  of  his  life  span 
would  be  around  175  to  180  years  instead  of  around  100  years 
as  it  is. 

Space  cannot  be  given  here  for  further  discussion  of  these 
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matters.  The  student  who  is  interested  in  them  will  find  further 
details  in  references  5 and  6 at  the  end  of  this  chapter. 

STATIONARY  POPULATIONS 

The  stationary  population  of  a life  table  serves  a useful  purpose 
as  a standard  in  the  computation  of  certain  derived  rates  to  be 
discussed  in  the  next  chapter.  For  this  purpose  it  is  desirable  to 
have  this  function  on  the  basis  of  a total  population  of  1,000,000 

TABLE  29 

Stationary  Life  Table  Population  of  1,000,000  Persons.  Number  Living  in 

Each  Yearly  Interval  of  Age 


Age  interval. 


0-  1 
1-  2 

2-  3 

3-  4 

4-  5 

5-  6 

6-  7 

7-  8 

8-  9 

9- 10 
10-11 
11-12 

12- 13 

13- 14 

14- 15 

15- 16 

16- 17 

17- 18 

18- 19 

19- 20 

20- 21 
21-22 

22- 23 

23- 24 

24- 25 

25- 26 

26- 27 

27- 28 

28- 29 

29- 30 

30- 31 

31- 32 

32- 33 
3-34 

34-35 


Persons  per 
million  in 
current  age 
interval. 


17,841 

16,916 

16,612 

16,448 

16,338 

16,255 

16,186 

16,127 

16,078 

16,036 

15,998 

15,962 

15,927 

15,890 

15,851 

15,808 

15,761 

15,708 

15,650 

15,586 

15,516 

15,441 

15,363 

15,282 

15,200 

15,117 

15,032 

14,946 

14,857 

14,765 

14,671 

14,573 

14,472 

14,367 

14,259 


Age  interval. 


35- 36 

36- 37 

37- 38 

38- 39 

39- 40 

40- 41 

41- 42 

42- 43 

43- 44 

44- 45 

45- 46 

46- 47 

47- 48 

48- 49 

49- 50 

50- 51 

51- 52 

52- 53 

53- 54 

54- 55 

55- 56 

56- 57 

57- 58 

58- 59 

59- 60 

60- 61 
61-62 

62- 63 

63- 64 

64- 65 

65- 66 

66- 67 

67- 68 

68- 69 

69- 70 


Persons  per 
million  in 
current  age 
interval. 


14,146 

14,031 

13,912 

13,791 

13,667 

13,540 

13,411 

13,278 

13,141 

13,000 

12,854 

12,702 

12,545 

12,383 

12,216 

12,045 

11,867 

11,683 

11,489 

11,284 

11,067 

10,836 

10,592 

10,336 

10,069 

9,791 

9,501 

9,199 

8,884 

8,556 

8,217 

7,868 

7,508 

7,139 

6,760 


Age  interval. 


70-  71 

71-  72 

72-  73 

73-  74 

74-  75 

75-  76 

76-  77 

77-  78 

78-  79 

79-  80 

80-  81 
81-  82 

82-  83 

83-  84 

84-  85 

85-  86 

86-  87 

87-  88 

88-  89 

89-  90 

90-  91 

91-  92 

92-  93 

93-  94 

94-  95 

95-  96 

96-  97 

97-  98 

98-  99 

99- 100 
100-101 
101-102 

102- 103 

103- 104 

104- 105 


Persons  per 
million  in 
current  age 
interval. 


6373 

5979 

5579 

5178 

4776 

4375 

3978 

3589 

3210 

2843 

2490 

2152 

1835 

1546 

1287 

1058 

859 

687 

541 

418 

318 

236 

172 

123 

86 

59 

39 

26 

17 

10 

6 

4 

2 

1 

1 
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persons  living.  The  necessary  computations  have  been  done  for 
three  age  class  ranges  and  the  results  are  presented  in  Tables  29, 
30,  and  31,  on  the  basis  of  the  Lx  data  of  Table  24  above.  This 


TABLE  30 

Stationary  Life  Table  Population  of  1,000,000  Persons.  Number  Living  in 

Each  Five-yearly  Interval  of  Age 


Age  interval.  Persons  per  million  in 

current  age  interval . 

0-  4 84,155 

5-  9 80,682 

10-  14 79,628 

15-  19 78,513 

20-  24 76,802 

25-  29 74,717 

30-  34 72,342 

35-  39 69,547 

40-  44 66,370 

45-  49 62,700 

50-  54 58,368 

55-  59 52,900 

60-  64 45,931 

65-  69 37,492 

70-  74 27,885 

75-  79 17,995 

80-  84 9,310 

85-  89 3,563 

90-  94 935 

95-  99 151 

100-104 14 


TABLE  31 


Stationary  Life  Table  Population  of  1,000,000  Persons. 

Each  Ten-yearly  Interval  of  Age 


Age  interval. 


0-9 

10-19 

20-29 

30-39 

40-49 

50-59 

60-69 

70-79 

80-89 

90-99 

100  and  over 


Number  Living  in 


Persons  per  million  in 
current  age  interval. 

....  164,837 
....  158,141 
....  151,519 
....  141,889 
....  129,070 
....  111,268 
. . . . 83,423 

. . . . 45,880 

. . . . 12,873 

1,086 
14 


then  is  the  population  derived  from  the  life  table  for  the  original 
registration  states  in  1910,  both  sexes  together. 

It  is  important  that  the  student  should  have  a clear  mental 
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picture  of  the  age  distribution  of  a stationary  life  table  population, 
and  of  the  manner  in  which  it  differs  from  the  actually  existing 
general  population  upon  which  the  life  table  is  computed.  Accord- 
ingly there  is  inserted  here  Table  32  (p.  262).  Table  32  exactly 


3 JO  15  20  25  30  35  40  AS  50  55  60  65  70  75  80  85  90  95  100 105 

Age 

Actual  YL 


L ife  Table 
Population 


Population 


A 


Fig.  66. — Diagram  comparing  the  standard  million  of  (a)  the  life  table  stationary 
population  (stippled  area),  and  ( b ) the  actual  population  (cross-hatched  area),  both 
for  the  year  1910,  and  for  both  sexes  together.  (Data  of  Tables  30  and  32.) 


corresponds  to  Table  30  in  arrangement,  but  gives  the  age  dis- 
tribution per  million  of  the  population  of  the  United  States  of  both 
sexes  actually  living  in  1910  by  quinquennial  age  groups. 

Figure  66  compares  the  life  table  standard  million  (from  Table 
30)  with  the  standard  million  of  the  actual  population. 
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From  this  diagram  it  is  apparent  that  the  essential  difference 
between  actual  and  life  table  populations  in  this  country  consists 
in  the  former  having  an  excess  of  persons  in  early  life  (up  to  age 
thirty-eight  years  roughly)  and  a defect  of  persons  of  all  ages 
beyond  that.  This  difference  arises  mainly  from  two  causes: 
excess  of  births  over  deaths  and  of  immigration  over  emigration 
in  the  actual  population. 


TABLE  32 

Standard  Million  of  Actual  Living  Population  (Both  Sexes)  in  the  United 

States,  1910 


Age  interval.  Persons  per  million. 

0-  4 : 115,806 

5-  9 106,321 

10-  14 99,203 

15-  19 98,728 

20-  24 98,656 

25-.  29 89,104 

30-  34 75,947 

35-  39 69,672 

40-  44 57,314 

45-  49 : . 48,682 

50-  54 42,491 

55-  59 30,358 

60-  64... 24,696 

65-  69 18,294 

70-  74 12,132 

75-  79 7,269 

80-  84 .* 3,505 

85-  89 1,338 

90-  94 365 

95-  99 80 

100-104 39 


THE  CONSTRUCTION  OF  LIFE  TABLES 

The  statement  has  already  been  made  that  it  is  not  the  intention 
to*go  here  into  the  methods  actually  employed  in  the  construction  of 
a life  table.  It,  however,  seems  only  fair  to  outline  the  procedure  in 
general  terms.  The  starting-point  is  the  determination,  from 
recorded  statistics  of  living  population  at  ages,  and  deaths  at  ages 
(and  for  the  early  part  of  life  births , because  of  the  inadequacy 
at  those  ages  of  census  counts  of  population,  and  because  of  the 
rapidity  of  the  flow  of  vital  events  in  the  first  year  of  life)  of  the 
specific  death-rates  at  ages.  From  these  specific  death-rates  (in  the 
sense  of  the  vital  statistician),  which  are  symbolically  designated 
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as  mx  values,  the  qx  s of  the  life  table  are  derived.  The  qx  values 
are  then  subjected  to  a more  or  less  elaborate  process  of  graduation 
or  smoothing , the  purpose  of  which  is  to  eliminate  such  portion  of 
the  minor  fluctuations  in  their  values  as  may  reasonably  be  supposed 
due  to  chance.  This  smoothing  process  is  where  the  heavy  mathe- 
matics of  actuarial  work  comes  in.  Around  this  phase  of  the 
subject  a highly  esoteric  cult  has  grown  up.  In  its  fundamental 
and  essential  principles  the  smoothing  process  is  simple  enough  to 
be  grasped  by  any  intelligent  person,  but,  like  many  other  things, 
when  finally  dressed  out  in  all  its  symbolic  panoply  it  is  forbidding. 

After  the  qx  values  have  been  graduated  the  rest  of  the  work  of 
constructing  a life  table  is  simple,  even  if  tiresome  in  its  extent. 
The  qx  s are  successively  applied  to  an  lx  group  starting  with  100,000 
at  age  zero  (birth)  to  determine  the  dx  s.  When  this  is  done  one 
has  lx,  dx,  and  qx  for  each  age  interval.  From  the  lx  s and  dx  s 
the  ex’s  are  easily  calculated. 

Short  methods  for  the  construction  of  life  tables  in  public 
health  work  have  been  discussed  by  Hayward3  and  Snow.9 
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CHAPTER  IX 


STANDARDIZED  AND  CORRECTED  DEATH-RATES 

It  has  been  seen  in  Chapter  VII  (Table  15  and  Fig.  58)  that 
the  specific  death-rates  are  characteristically  different  at  different 
ages.  The  fact  is  also  brought  out  strikingly  by  the  qx  curve  of  the 
life  table.  Now  this  circumstance  must  obviously  have  important 
consequences  in  regard  to  the  use  of  general  death-rates  at  all  ages 
to  measure  the  comparative  mortality  in  different  communities. 
For  suppose  two  communities  to  have  absolutely  identical  specific 
death-rates  at  different  ages.  But  suppose,  further,  that  one  of  the 
communities  is  primarily  a manufacturing  place,  and  in  consequence 
has  a large  excess  of  young  adults  in  its  population,  whereas  the 
other  is  primarily  a residence  city  for  elderly,  retired  persons.  The 
former  will  have  relatively  few  persons  of  advanced  age  where  the 
specific  death-rates  are  high.  The  latter  will  have  relatively 
many  of  such  persons.  In  consequence  of  this  difference  in  the 
age  distribution  of  the  living  the  two  places  are  bound  to  have  quite 
different  general  death-rates  at  all  ages,  even  though,  as  postulated, 
all  the  specific  death-rates  are  identical  in  the  two  places. 

It  therefore  follows  that  crude  death-rates  at  all  ages  should  be 
corrected  to  allow  for  differences  in  the  age  distribution  of  the  general 
population.  This  may  be  done  by  the  use  of  what  are  called 
standardized  and  corrected  death-rates. 

STANDARDIZED  DEATH-RATES 

A standardized  death-rate  is  an  abstract  or  theoretic  figure  de- 
rived by  applying  the  specific  death-rates  of  the  general  popula- 
tion, or  of  some  standard  imaginary  population,  to  the  actually 
existing  age  and  sex  distribution  of  the  living  population  of  a 
particular  locality  to  determine  what  would  be  the  number  of 
deaths  in  that  locality  if  the  specific  death-rates  of  the  standard 
population  prevailed  there,  and  then  dividing  the  number  of  deaths 
so  obtained  by  the  actual  total  living  population  of  the  locality. 
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In  the  calculation  of  the  standardized  death-rate  the  actual 
deaths  in  the  locality  do  not  enter  at  all.  Expressed  in  a formula 
the  case  is  like  this: 


Rst 


^ (P x A Qx) 

ZPX 


where 


Rst  — a standardized  death  rate, 

Px  — actual  living  population  of  age  x in  the  community  for  which  the  rate 
is  calculated, 

qx  = the  specific  death-rate  at  age  x in  the  general  population,  or  in  the  life 
table  population,  or  in  some  other  arbitrarily  chosen  standard  popula- 
tion, 

2 denotes  summation  over  all  values  of  x. 

(Standardized  death-rates  are  usually  expressed  as  per  1000  of  population.) 


An  example  will  make  the  case  clear. 

Suppose  we  take  the  life  table  population  of  the  original  Regis- 
tration states  in  1910,  as  determined  by  Glover,  as  a standard  of 
reference,  and  confine  attention,  for  the  sake  of  simplicity,  to  age 
alone,  dealing  with  both  sexes  together,  we  find  the  following 
* specific  death-rates  at  ages  in  that  population  to  be  as  given  in 
Table  33. 

TABLE  33 


■ Life  Table  Death-rates,  from  Table  24  Supra 

Age  interval.  Rate  of  mortality  per  thousand 

living  in  current  age  interval. 


Under  5 37.19 

5-9.9 3.44 

10-19.9 2.93 

20-39.9 6.64 

40-59.9.  . . 15.28 

60-79.9 56.22 

80  and  over 190.61 


All  ages  together 


19.42 


Now  an  examination  of  the  Mortality  Statistics  reveals  that  in 
the  year  1910  the  crude  death-rate  was, 

In  Providence,  R.  1 17.66  per  thousand 

In  Seattle,  Wash 10.05  “ 


But  the  census  of  1910  revealed  further  that  the  .living  popula- 
tions of  these  two  cities  were  constituted  in  respect  of  age  as  shown 

in  Table  34. 

# 


STANDARDIZED  AND  CORRECTED  DEATH-RATES 


267 


TABLE  34 

Actual  Living  Population  in  1910  of  Providence  and  Seattle 


Age  interval. 

Population  in  thousands 
of  Providence,  R.  I. 

Population  in  thousands 
of  Seattle,  Wash. 

Under  5 

21.814 

17.043 

5-  9.9 

18.707 

15.123 

10-19.9 

38.315 

32.666 

20-39.9 

83.563 

109 . 340 

40-59.9 

46.482 

49.817 

60-79.9 

14.111 

10.140 

80  and  over 

1.058 

.590 

Totals 

224.050 

234.719 

It  is  at  once  apparent  that  while  these  two  cities  were  of  about 
the  same  total  size  in  1910,  the  age  distributions  of  the  two  popula- 
tions were  widely  different.  Providence  had  a great  many  more 
young  people  under  twenty  than  had  Seattle.  Seattle,  on  the 
contrary,  had  many  more  young  adults  (twenty  to  thirty-nine) 
than  had  Providence.  Plainly,  Seattle  would  be  bound  to  have  a 
lower  crude  death-rate  than  Providence,  because  there  were  in  the 
population  fewer  persons  to  whom  high  specific  death-rates  apply, 
and  more  persons  to  whom  low  specific  rates  apply. 

Now,  according  to  the  rule  set  forth  above,  to  get  the  standardized 
death-rate  it  is  merely  necessary  to  perform  the  operations  shown 
in  Table  35. 

TABLE  35 

Expected  Deaths  in  Providence  and  Seattle  in  1910  if  the  Life  Table  Death- 

rates  Prevailed 


Age  interval. 

Providence  population 

X Life  table  specific  death-rates 
( = deaths  which  would  have  occurred 
in  Providence  if  life  table  rate  of 
mortality  had  existed  there). 

Seattle  population 

X Life  table  specific  death-rates 
( = deaths  which  would  have  occurred 
in  Seattle  if  life  table  rate  of  mor- 
tality had  existed  there). 

Under  5 

811.26 

633.83 

5-9 

64.35 

52.02 

10-19 

112.26 

95.71 

20-39 

554.86 

726.02 

40-59 

710.24 

761.20 

60-79 

793.32 

570.07 

80  and  over 

201.67 

112.46 

Totals 

3247.96  = 2 (Px  X qx) 

2951.31  = 2 (Px  X qx) 
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Hence 

/ 3247  96\ 

For  Providence  Rst  = 1000  ( ' 224050  ) = ^ ^ 

/ 29S1  31\ 

For  Seattle  R$t  = 1000  ( 234719  ) = 12  57 


These  figures  tell  us  that  if  identical  forces  of  mortality  had 
operated  in  Providence  and  Seattle,  the  crude  rates  of  the  two  places 
would  have  been  different  in  the  ratio  indicated,  solely  because  of 
differences  in  the  age  constitution  of  the  living  population.  But 
it  cannot  have  failed  to  impress  one  that  it  is  a curious  use  of  words 
to  call  this  standardized  rate  a death-rate  of  Providence , for  example, 
because  in  its  calculation  no  account  whatever  is  taken  of  the  deaths 
which  occurred  in  Providence.  Providence’s  statistics  only  enter 
into  the  situation  at  all  in  respect  of  the  living,  not  the  dead.  But 
surely  a death-rate  may  not  unreasonably  be  required  to  have  in 
it  something  about  the  deaths  which  really  occurred. 

Can  this  be  done  on  the  basis  of  only  such  data  as  are  now  in 
hand?  It  can,  and  in  this  way.  It  has  already  been  seen  from 
Table  33  that,  in  the  life  table  population  which  we  are  taking  as  a 
standard,  the  death-rate  for  all  ages  together  is  19.42  per  thousand. 
Now  then  it  is  obvious  that  the  standardized  rates  which  have  been 
obtained  above  for  Providence  and  Seattle  differ  from  the  death- 
rate  for  all  ages  in  the  standard  population,  only  because  of  the 
differences  in  the  age  distribution  of  the  living  in  the  actual  popula- 
tions of  Providence  and  Seattle  respectively,  and  of  the  living  in 
the  standard  population.  Therefore  it  follows  that  the  ratio 


Death-rate  in  standard  population 
Standardized  death-rate  of  local  population 


will  give  a correction  factor  which  will  measure  the  amount  by 
which  the  crude  death-rate  of  the  local  population  is  altered  from 
the  death-rate  at  all  ages  of  the  standard  population,  as  a result 
solely  of  the  difference  between  the  two  populations  in  respect  of  the 
age  distribution  of  the  living. 

We  then  have 


Correction  factor  for  Providence 


19.42 

14.50 

19.42 

12.57 


1.339 

1.545 


Correction  factor  for  Seattle 
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These  figures  indicate  that  the  crude  death-rates  of  both  cities  are 
lower  than  they  would  be  if  their  living  populations  had  the  same 
age  distribution  as  the  standard  population,  even  though  both 
cities  had  the  same  specific  forces  of  mortality  that  they  actually  do. 
If  the  correction  factor  were  less  than  1 it  would  mean  that  the 
crude  death-rates  were  higher  than  they  would  be  in  a population 
of  the  same  age  distribution  as  the  standard. 

Now,  as  has  been  seen,  the  crude  death-rate  of  Providence  was 
17.66,  and  of  Seattle  10.05.  So  then, 

17.66  X 1.339  = 23.65  = a death-rate  for  Providence  in  which  is  included  ( a ) the 
specific  forces  of  mortality  peculiar  to  Providence  (introduced  implicitly  in  the 
crude  rate  17.66);  and  (b)  an  allowance  for  the  peculiar  age  distribution  of  the 
living  population  of  Providence,  which  brings  it  to  identity  with  the  age  dis- 
tribution of  the  standard  population. 

Similarly  for  Seattle,  we  have 

10.05  X 1.545  = 15.53  = a death-rate  for  Seattle  which  has  the  same  properties  as 
those  described  above  for  Providence. 


CORRECTED  DEATH-RATES 


A corrected  death-rate  is  an  abstract  or  theoretic  figure  got  by 
applying  the  specific  death-rates  observed  in  a local  population  to 
the  age  and  sex  distribution  of  some  arbitrarily  chosen  standard 
population.  A corrected  death-rate  is,  in  short,  just  the  reverse  of 
a standardized  death-rate.  It  answers  questions  like  the  following: 
What  would  be  the  death-rate  of  city  A if  instead  of  having  the 
actual  age  distribution  of  the  population  which  it  has,  it  had  an 
age  distribution  identical  with  that  of  the  standard  population? 
How  much  of  the  difference  in  the  crude  death-rates  of  cities 
A and  B is  to  be  attributed  to  the  fact  that  the  age  distributions 
of  the  populations  are  different  in  the  two  places? 

The  formula  for  a corrected  death-rate  is, 


where 


Rco  = 


2 (Lx  X Rsx ) 
2 (Lx) 


Rco  = a corrected  death-rate, 

Lx  = the  number  of  persons  of  age  x in  the  standard  population, 

Rsx  = the  specific  death-rate  at  age  x observed  in  the  particular  locality  for 
which  the  corrected  rate  is  being  calculated, 

2 denotes  summation  over  all  values  of  x. 

(Corrected  death-rates  are  usually  expressed  as  per  1000  of  population.) 
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Coming  back  to  the  Providence-Seattle  example  we  have  already 
had  given  in  Table  34  the  populations  of  these  two  cities  at  ages. 
Table  36  gives  the  deaths  at  ages  in  columns  (1)  and  (2).  By 
dividing  each  figure  in  column  (1)  of  Table  36  by  the  corresponding 
population  figure  of  Table  34,  we  shall  get  the  specific  death-rates 
of  Providence  set  down  in  column  (3),  and  similarly  for  Seattle  in 
column  (4). 

TABLE  36 

Specific  Death-rates  Per  Thousand  of  Providence  and  Seattle 


Age  interval. 

Deaths 

in 

Provi- 

dence. 

Deaths 

in 

Seattle. 

Specific  death-rate 
in  Providence 
(per  1000). 

Specific  death-rate 
in  Seattle 
(per  1000). 

(1) 

(2) 

(3) 

(4) 

Under  S 

1175 

453 

1175 

53.86 

453 

26.58 

21.814 

17.043 

5-  Q 

74 

50 

74 

3.96 

50 

3.31 

18.707 

15.123 

10-19 

144 

107 

144 

3.76 

107 

3.28 

38.315 

32 . 666 

20-39 

596 

623 

596 

7.13 

623 

5.70 

83.563 

109.340  ' 

40-59 

854 

625 

854 

18.37 

625 

12.55 

46.482 

49.817  " 

60-79 ’ 

954 

447 

954 

67.61 

447 

44.08 

14.111 

10.140  ' 

80  and  over 

182 

103 

182 

1.058 

172.02 

103 

.590 

174.58 

Totals 

3979 

2408 

The  next  step  is  to  multiply  the  appropriate  standard  population 
figures  derived  from  Tables  29,  30,  and  31  of  the  preceding  chapter 
by.  the  specific  death-rates  of  Table  36  above,  to  get  the  number  of 
deaths  which  would  have  occurred  in  Providence  and  Seattle  had 
their  living  population  been  that  of  our  standard  million,  and  their 
specific  forces  of  mortality  as  they  were.  This  is  done  in  Table  37. 
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TABLE  37 

Deaths  Expected  in  1910  in  Providence  and  Seattle  if  Their  Populations 
Had  Had  the  Same  Age  Distribution  as  the  Stationary  Life  Table  Popu- 
lation 


Age  interval. 

Persons  in 
standard 
population 
in  thousands. 

(1) 

(1)  X Providence  specific 
death-rates  per  1000. 

(2) 

(1)  X Seattle  specific  death- 
rates  per  1000. 

(3)  ' 

Under  5 

84.155 

84.155  X 53.86  = 4,532.6 

84.155  X 26.58  = 2,236.8 

5-9 

80.682 

80.682  X 3.96  = 319.5 

80.682  X 3.31  = 267.1 

10-19 

158.141 

158.141  X 3.76  = 594.6 

158.141  X 3.28  = 518.7 

20-39 

293 . 408 

293.408  X 7.13  = 2,092.0 

293.408  X 5.70  = 1,672.4 

40-59 

240.338 

240.338  X 18.37  = 4,415.0 

240.338  X 12.55  = 3,016.2 

60-79 

129.303 

129.303  X 67.61  = 8,742.2 

129.303  X 44.08  = 5,699.7 

80  and  over..  . . 

13.973 

13.973  X 172.02  - 2,403.6 

13.973  X 174.58  = 2,439.4 

Totals 

1,000.000 

23,099.5 

15,850.3 

Whence  we  have: . 

23  100 

For  Providence:  Rco  = 1000  ["qq^qqq  = 23.10 
For  Seattle:  RCo  = 1000  AoO^O  = >5.85 

It  will  be  noted  at  once  that  these  corrected  death-rates  are 
nearly  the  same  as  those  got  by  the  correction  factor  from  the 
• standardized  rates  above.  There  are  thus  available  two  different 
methods  of  computation  for  getting  corrected  death-rates.  The 
method  given  in  this  section  is  the  more  refined  and  exact. 

The  same  principle  as  that  which  has  been  illustrated  in  Table 
37  can  be  successively  applied,  provided  the  necessary  data  are  at 
hand,  to  correct  death-rates  for  a whole  series  of  variables.  Actually, 
the  necessary  data  are  usually  not  available,  so  that  when  a cor- 
rected death-rate  is  spoken  of,  all  that  is  commonly  meant  is  a 
death-rate  corrected  for  the  age  and  sex  distribution  of  the  popula- 
tion. 

THE  SIGNIFICANCE  OF  STANDARD  POPULATIONS  IN  CALCULATING 

CORRECTED  DEATH-RATES 

It  will  have  been  perceived  by  the  thoughtful  that  all  that  a 
corrected  death-rate  is  is  a weighted  average  of  the  local  specific  death- 
rates , the  weighting  being  in  proportion  to  the  portions  in  each  age 
group  of  the  population  chosen  as  the  standard.  Looking  at  a 
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corrected  death-rate  in  this  way  one  is  led  to  ask  the  question: 
What  is  the  best  system  of  weights  to  choose,  or,  in  other  words, 
What  shall  be  taken  as  the  standard  million  of  population? 

The  answer  to  this  question  depends  in  part,  as  do  all  similar 
questions  of  weighting,  upon  what  answer  is  given  to  the  further 
question:  What  do  you  want  to  do  with  the  corrected  death-rate 
after  you  get  it?  If  one’s  point  of  view  is  to  seek  what  would  be 
the  value  of  a local  death-rate  if  the  locality  had  the  average 
population  distribution  of  the  whole  country  of  which  it  is  a part, 
the  standard  population  will  be  so  chosen  as  to  be  nearly  or  quite 
identical  with  the  actually  existing  population  of  the  whole  country. 
This  is  the  usual  procedure.  The  Registrar- General  of  England  and 
Wales  uses  as  a standard  of  reference  the  age  and  sex  distribution  of 
the  actual  population  of  England  and  Wales  over  a period  of  years. 

If,  on  the  other  hand,  one  is  interested  in  getting  as  stable  a 
standard,  both  in  time  and  space,  as  is  possible,  the  Lx  population  of 
a life  table  will  be  better  than  any  actually  existing  population.  This 
will,  however,  just  because  it  is  not  a growing  population,  be  quite 
different  from  most  existing  populations  in  respect  of  age  distribu- 
tion, as  has  already  been  seen  in  the  preceding  chapter.  Table  38 
shows  a standard  million  of  the  population  of  the  United  States  in 
1910  distributed  to  the  same  age  classes  as  used  in  the  Providence- 
Seattle  example.  It  is  obviously  quite  different  from  the  life  table 
standard  population  given  in  Table  37  on  page  271. 

TABLE  38 

A Standard  Million  from  the  Actual  Living  Population  of  the  United  States 

in  1910 


Age  interval. 

Population  both  sexes 
U.  S.,  1910. 

Population  basis, 
1,000,000. 

0-  4. 

10,631,364 

115,806 

5-9 

9,760,632 

106,321 

10-19. 

18,170,743 

197,931 

20-39 

30,605,272 

333,379 

40-59 

16,418,526 

178,845 

60-79 

5,727,683 

62,391 

80  and  over 

488,991 

5,327 

Total 

91,803,211* 

1,000,000 

* This  total  does  not  include  “ages  unknown.” 
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Suppose  we  calculate  the  corrected  death-rates  of  Providence 
and  Seattle,  weighting  the  specific  death-rates  with  the  million  of 
Table  38  as  a standard.  The  result  is  that  shown  in  Table  39. 

TABLE  39 

Expected  Deaths  in  Providence  and  Seattle  in  1910,  on  Basis  of  Actual  United 

States  Population  as  Standard 


Age  interval. 

Persons  in 
actual  pop- 
ulation, both 
sexes,  in 
thousands. 

(1) 

(1)  X Providence  specific 
death-rates  per  1000. 

(2) 

(1)  X Seattle  specific  death- 
rates  per  1000. 

(3) 

0-5 

115.806 

115.806  X 53.86  = 6,237.3 

115.806  X 26.58  = 3,078.1 

5-9. 

106.321 

106.321  X 3.96  = 421.0 

106.321  X 3.31  = 351.9 

10-19 

197.931 

197.931  X 3.76  - 744.2 

197.931  X 3.28  = 649.2 

20-39 

333.379 

333.379  X 7.13  = 2,377.0 

333.379  X 5.70  = 1,900.3 

40-59 

178.845 

178.845  X 18.37  = 3,285.4 

178.845  X 12.55  = 2,244.5 

60-79 

62.391 

62.391  X 67.61  = 4,218.3 

62.391  X 44.08  = 2,750.2 

80  and  ovei..  . . 

5.327 

5.327  X 172.02  = 916.4 

5.327  X 174.58  = 930.0 

Total 

1,000.000 

18,199.6 

11,904.2 

Whence  the 

Corrected  death-rate  for  Providence  = 18.20 
Corrected  death-rate  for  Seattle  = 11.90 


These  values,  for  perfectly  obvious  reasons,  are  smaller  than 
those  got  above  on  the  basis  of  the  Lx  population  and  are  much 
nearer  absolutely  to  the  crude  rates.  The  ratios  of  the  death-rates 
for  the  two  cities  are  as  follows: 


Crude 

Corrected  (Lx  pop.  standard) 
Corrected  (actual  pop.  standard) 


17.66 

10.05 

23.10 

15.85 

18.20 

11.90 


1.76 

1.46 

1.53 


It  is  seen  that  the  judgment  of  the  relative  mortality  rates  of 
Providence  and  Seattle  is  not  sensibly  altered  if  use  is  made  of  the 
life  table  population  or  of  the  actually  existing  population  of  the 
whole  country  as  standard.  The  ratios  are  only  .07  apart.  But 
both  ratios  are  far  from  that  derived  from  the  crude  rates. 

One  can  obviously  build  up  standard  populations  in  various 
ways.  One  which  has  been  used  is  to  take  a million  persons  so 
18 
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distributed  as  to  age  (and  sex  if  one  wishes)  as  to  yield  1000  deaths 
per  year  on  the  basis  either  of  ( a ) the  specific  death-rates  of  the 
actual  population  of  the  whole  country,  or  of  (b)  the  specific  death- 
rates  of  the  life  table. 

On  the  whole,  the  matter  is  really  one  of  arbitrary  choice, 
governed  essentially  by  taste  and  viewpoint  as  to  purpose,  rather 
than  strict  logic.  My  own  preference  is  for  the  Lx  population  of 
the  life  table  as  a standard,  because  of  its  inherent  stability.  If  one 
recognizes  that  any  corrected  death-rate  is  at  best  a purely  artificial 
figure,  there  will  be  no  need  to  worry  over  the  artificiality  of  a life 
table  population  as  a standard. 

From  a purely  biologic  viewpoint  probably  the  most  significant 
system  would  be  one  which  weighted  equally  each  specific  death- 
rate  and  averaged.  This  is  the  same  as  assuming  an  equal  number 
of  persons  in  each  age  group  of  the  standard  population.  This 
idea  is  not  likely  to  appeal  to  public  health  officials  or  to  professional 
official  vital  statisticians.  It  is  based  upon  these  considerations. 
Provided  the  subsamples  at  ages  are  sufficiently  large  each  to  give 
a reliable  rate,  having  regard  to  the  probable  errors,  any  age  and 
sex  specific  death-rate  is  a definite  quantitative  biologic  attribute 
of  the  group  to  which  it  applies.  It  differs  between  group  Ax  and 
group  Bx  because  of  one  or  the  other  or  both  of  the  following  factors, 
and  for  no  other  reason: 

1.  The  organisms  composing  Ax  are  inherently  different  from 
those  composing  Bx. 

2.  The  environment  of  Is  different  from  that  of  Bx. 

The  simple,  unweighted  average  of  age  specific  death-rates 
gives  in  a single  numeric  value  not  any  measure  of  the  public 
health,  but  an  excellent  measure  of  a highly  significant  biologic 
situation.  It  offers  a method  of  getting  a little  nearer  to  an  ade- 
quate appreciation  of  the  relative  influence  of  constitution  and 
environment  in  determining  mortality  rates. 

OTHER  APPLICATIONS  OF  THE  CORRECTED  RATE  PRINCIPLE 

While  we  have  considered  so  far  in  this  chapter  only  examples  of 
forming  standardized  and  corrected  death-rates,  the  student  should 
understand  that  the  method  which  has  been  used  is  of  much  wider 
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and  more  general  application.  In  fact,  the  method  is  theoretically 
perfectly  general.  It  can  be  employed  to  correct  any  crude  rate , 
as  defined  at  the  beginning  of  Chapter  VII,  for  the  influence  of  any 
number  of  variables  for  which  the  requisite  data  are  available. 

In  further  illustration  of  the  method  it  is  proposed  now  to 
give  another  example  in  a different  field  than  death  or  death- 
rates.  The  material  for  the  example  is  given  in  a paper  by  Dr. 
Robert  H.  Riley,*  Director  of  the  Maryland  State  Department  of 
Health,  dealing  with  the  disease  infantile  paralysis  (poliomyelitis). 
His  Table  1 includes  the  case  incidence  of  the  disease  during  the 
1928  epidemic  outbreak  in  five  states.  Here  two  only  of  these 
states,  California  and  Minnesota,  will  be  taken  for  purposes  of 
illustration.  In  California  289  cases  occurred,  and  in  Minnesota 
221.  We  have  then  the  following  crude  incidence  rates , the  popula- 
tion being  estimated  population  as  of  1928. 


For  California: 


100,000 


289 

4,556,000 


For  Minnesota:  100,000 


221 

2,722,000 


= 6.3  = crude  incidence  rate  per  100,000. 
= 8.1  = crude  incidence  rate  per  100,000. 


On  the  basis  of  the  crude  rates  alone,  Minnesota  appears  to  have 
had  about  a third  heavier  incidence  of  the  disease  than  California. 

Table  40  gives  (a)  the  age  incidence  of  the  cases,  as  reported  by 
Riley,  ( b ) the  estimated  populations  (in  thousands)  for  the  same 


TABLE  40 

Cases  of  Poliomyelitis,  Estimated  Populations,  and  Age  Specific  Incidence 
Rates  Per  100,000  of  Poliomyelitis  in  1928  in  California  and  Minnesota 


Age  in  years. 

California. 

Minnesota. 

Cases  of 
poliomye- 
litis. 

(a) 

Population 
in  thousands. 

(b) 

Incidence 

rate. 

(c) 

Cases  of 
poliomye- 
litis. 

(a) 

Population 
in  thousands. 

(b) 

Incidence 

rate. 

(c) 

Under  1 

16 

73 

0.000219 

0 

57 

0 

1 to  4 

73 

292 

. 000250 

72 

242 

. 000298 

5 to  9 

89 

374 

. 000238 

68 

283 

.000240 

10  to  14.  ... . 

35 

346 

.000101 

36 

267 

.000135 

15  to  19 

35 

323 

.000108 

23 

250 

.000092 

20  and  over. . 

41 

3148 

.000013 

22 

1623 

.000014 

Totals.  . . . 

289 

4556 

221 

2722 

* Riley,  R.  H.:  Poliomyelitis,  Jour.  Amer.  Med.  Assoc.,  vol.  94,  pp.  550-557,  1930. 
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ages  in  1928,  and  (c)  the  age  specific  incidence  rates,  got  in  each 
case  by  dividing,  line  by  line,  (a)  by  (b). 

Using  the  standard  million  of  the  stationary  life  table  popula- 
tion for  reference,  we  have  in  Table  41  the  number  of  cases  of 
poliomyelitis  expected  to  occur  in  that  population  under  the  age 

specific  incidence  rates  of  California  and  Minnesota  respectively. 

© 

TABLE  41 

Expected  Incidence  oe  Poliomyelitis  in  California  and  Minnesota  if  Both 
Had  the  Same  Standard  Population  (the  Stationary  Life  Table  Population) 


Age  in  years. 

Stationary  life 
table  popula- 
tion. 

(a) 

California. 

Minnesota. 

Age  specific 
incidence  rates 
from  Table  40. 

(b) 

Expected  cases 
in  specified 
standard  million 
of  population. 
(a)  X (5) 

Age  specific 
incidence  rates 
from  Table  40. 

(b) 

Expected  cases 
in  specified 
standard  million 
of  population. 
(a)  X (b) 

Under  1 

17.841 

0.000219 

3.9 

0 

0 

1 to  4 

66.314 

.000250 

16.6 

. 000298 

19.8 

5 to  9 

80.682 

. 000238 

19.2 

. 000240 

19.4 

10  to  14 

79.628 

.000101 

8.0 

.000135 

10.7 

15  to  19 

78.513 

.000108 

8.5 

.000092 

7.2 

20  and  over. . 

677.022 

.000013 

8.8 

.000014 

9.5 

Totals.  . . . 

1,000,000 

65.0 

66.6 

It  thus  appears  that  the  incidence  rates  of  poliomyelitis  in 
California  and  Minnesota  in  1928,  when  corrected  to  the  same  age 
distribution  of  the  population  (that  of  the  stationary  life  table 
population) , have  the  following  values : 


California: 


100,000 


65.0 

1,000,000 


Minnesota:  100,000 


66.6 

1,000,000 


6.5  cases  per  100,000  population. 


6.7  cases  per  100,000  population. 


The  difference  in  the  crude  rates  of  the  two  states  thus  dis- 
appears upon  correction  for  age  differences. 

In  the  illustrations  given  in  this  chapter  the  correction  has  been 
for  differences  in  age  distribution  of  different  populations.  The 
same  method  can  be  used  to  correct  for  differences  in  population 
distributions  relative  to  sex,  color,  race,  occupation,  and,  indeed, 
any  other  factor  for  which  the  necessary  data  are  available. 
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CHAPTER  X 


THE  PROBABLE  ERROR  CONCEPT 

Perhaps  the  simplest  and  most  direct  way  in  which  statistical 
methods  can  be  of  practical  use  to  the  medical  man  in  his  every- 
day problems  is  by  giving  him  a means  of  measuring  and  stating 
precisely  the  degree  of  reliability  which  attaches  to  any  particular 
set  of  results  or  conclusions  he  may  reach.  Only  a little  considera- 
tion of  the  matter  will  be  necessary  to  convince  anyone  that  the 
reliability  or  trustworthiness  of  any  conclusion  is  in  some  way  a 
function  of  the  number  of  cases  upon  which  it  is  based.  If  the 
number  of  cases  determined  forms  but  a small  sample  of  all  the 
cases  it  would  be  possible  to  collect,  it  is  probable  that  there  will 
be  considerable  fluctuation  among  the  results  given  by  such  small 
sampling. 

As  an  illustration  of  the  effect  of  random  sampling,  let  us 
consider  the  following  case:  In  any  large  city,  or  a state,  or  indeed, 
any  large  population  aggregate,  the  average  age  at  death  of  persons 
dying  at  the  same  calendar  date  should  be  identical  for  all  dates, 
except  for  the  influence  of  two  factors,  viz.,  (a)  chance,  or  random 
sampling,  and  (b)  long  seasonal  waves  arising  from  such  considera- 
tions as  that  relatively  more  infants  die  in  hot  summer  weather  than 
in  the  colder  seasons  of  the  year.  In  any  short  period,  say  ten 
consecutive  days,  the  second  factor  (b)  would  not  operate  in  any 
sensible  degree,  and  we  should  expect  the  persons  dying  on  each  of 
these  consecutive  ten  days  to  show  the  same  average  age,  except 
for  the  fluctuations  due  to  chance  alone.  How  considerable  these 
fluctuations  may  be  is  shown  in  Table  42,  which  gives  the  number 
of  deaths  and  the  age  at  death  of  those  dying  during  ten  consecutive 
days  in  1916  in  Baltimore. 

Here  we  have  a fluctuation  in  the  average,  based  on  samples  of 
from  30  to  50  individuals,  amounting  to  more  than  twenty-two 
years,  arising  from  random  sampling  alone.  Such  an  illustration 
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TABLE  42 

Mean  Age  at  Death  of  Those  Dying  in  the  Stated  Days  in  Baltimore 


Date. 

Number  of 
deaths. 

Mean  age  at 
death  in 
years. 

January  13,  1916 

31 

30.16 

January  14,  1916 

40 

43.80 

January  15,  1916 

27 

40.59 

January  16.  1916.  

48 

48.21 

January  17,  1916 

32 

48.34 

January  18.  1916 

41 

51.90 

January  19.  1916. 

39 

46.82 

January  20,  1916.  

31 

52.39 

January  21,  1916 

39 

51.62 

January  22,  1916 

57 

39.40 

emphasizes  the  fact  that  before  conclusions  can  safely  be  drawn 
from  differences  between  numbers  it  is  necessary  to  know  something 
about  the  “probable  errors”  of  those  numbers. 

Another  example  of  random  fluctuations  may  be  given:  In 
“Who’s  Who”  the  names  are  entered  in  alphabetic  order.  If  one 
takes  five  names  in  order  as  they  are  given  and  determines  the  aver- 
age age  at  which  these  hve  persons  married,  and  then  takes  the  next 
five  names  in  order  and  does  the  same  thing,  and  so  on,  there  is 
no  reason  why  the  average  ages  at  marriage  should  not  be  identical 
for  all  such  groups  of  five,  except  for  the  operation  of  chance.  Five 
is  a small  sample,  and  we  know  from  practical  experience  of  life 
that  probably  the  first  set  of  five  ages  at  marriage  so  chosen  will 
not  give  quite  the  same  average  as  the  second  set,  and  so  on. 

Table  43  gives  the  result  of  such  an  experiment  with  * Who’s 
Who.”  I opened  Vol.  X (1918-19)  at  random  and  the  page  chanced 
to  be  680.  This  is  in  the  letter  D and  the  first  name  on  that  page 
is  William  Franklin  Dana.  I then  calculated  the  age  at  marriage 
for  each  person  in  order,  without  any  omissions  whatever,  except 
such  as  were  occasioned  by  (a)  failure  of  the  person  to  have  married, 
or  (b)  absence  of  birth  date  or  marriage  date,  or  both.  The  figures 
obtained  are  given  in  the  upper  half  of  Table  43.  As  soon  as  the 
fifth  age  of  each  set  of  five  was  set  down  the  average  for  that  group 
was  calculated  before  going  on  to  the  next  name.  This  was  kept 
up  till  ten  groups  or  fifty  names  had  been  taken  out. 
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When  this  first  series  was  done  and  the  means  plotted,  it  was 
decided  to  take  a second  fifty  names  from  another  part  of  the  al- 
phabet. So  the  book  was  opened  again  at  random  and  the  page 
chanced  to  be  2486,  with  the  first  name  Frederic  Singer.  The  same 
procedure  as  before  for  fifty  consecutive  names  gave  the  bottom 
half  of  Table  43. 

TABLE  43 


Showing  the  Average  Age  at  Marriage  of  Ten  Consecutive  Groups  of  Five 
Persons  Each,  Taken  in  Order  from  “Who’s  Who”  in  Letter  D 
Beginning  at  p.  680. 


Age  at 

Age  at 

Age  at 

Age  at 

Age  at 

marriage. 

marriage. 

marriage. 

marriage. 

marriage. 

' 22 

30 

f 30 

[ 28 

f 31 

34 

30 

39 

38 

28 

I J 

35 

Ill 

26 

V 

28 

VII  <! 

41 

IX  [ 

33 

34 

26 

30 

46 

30 

125 

,31 

[35 

[38 

[28 

Average 

30.0 

Average 

28.6 

Average 

32.4 

Average 

38.2 

Average 

30.0 

r 23 

f 29 

f 33 

f 32 

f 28 

30 

26 

45 

27 

25 

II  ^ 

33 

IV  j 

21 

VI  j 

23 

VIII 

24 

X 

33 

36 

26 

32 

32 

50 

33 

,26 

36 

\ 

28 

[28 

Average 

31.0 

Average 

25.6 

Average 

33.8 

Average 

28.6 

Average 

32.8 

A Second  Group  Like  Above,  but  from  Letter  S 
Beginning  at  p.  2486. 


Age  at 
marriage. 

Age  at 
marriage. 

Age  at 
marriage. 

Age  at 
marriage. 

Age  at 
marriage. 

I 

f 33 

25 

28 

31 

[28 

Ill  - 

f 32 

30 

28 

27 

[36 

V* 

f 28 

35 

35 

29 

[22 

VII  - 

f 25 

37 

31 

32 

32 

IX  < 

[ 32 

28 

28 

27 

32 

Average  29 . 0 

Average  30 . 6 

Average  29 . 8 

Average  31.4 

Average  29 . 4 

II  < 

f 29 

31 

23 

30 

[27 

IV  < 

f 31 

24 

26 

30 

25 

\ 

VI  - 

f 28 

29 

45 

25 

,35 

VIII  < 

f 23 

27 

30 

27 

[31 

X 

[ 24 

24 

33 

30 

[29 

Average  28 . 0 

Average  27 . 2 

Average  32.4 

Average  27 . 6 

Average  28 . 0 
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The  means  of  the  two  series  are  shown  graphically  in  Fig.  67, 
the  solid  line  showing  the  group  means  for  the  50  persons  whose 
names  began  with  D,  and  the  broken  line  the  group  means  for 
the  persons  having  names  beginning  with  S. 

Table  43  and  Fig.  67  show  a number  of  interesting  things  about 
random  sampling  and  the  phenomenon  we  call  chance.  In  the 
first  place,  the  fluctuations  of  the  group  averages  are  large,  con- 
sidering the  inherent  stability  of  the  phenomenon  with  which  we 
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Fig.  67. — Group  averages  of  age  at  marriage  of  persons  taken  at  random.  (Data 
from  Table  43  above.)  The  Roman  numerals  indicate  the  order  of  the  groups  from 
the  starting-points  indicated  in  the  text.  Solid  line  = data  from  upper  half  of  table. 
Broken  line  = data  from  lower  half  of  table. 

are  dealing.  In  the  D series  Group  IV  has  a mean  age  at  marriage 
of  25.6  years,  while  Group  VII  has  a mean  of  38.2,  almost  thirteen 
years  higher.  In  the  second  place  the  means  of  the  D series  do  not 
fluctuate  about  a straight  horizontal  line.  Instead  there  are  three 
more  or  less  well-defined  trends,  downward  from  Group  I to  IV, 
upward  from  Group  IV  to  VII,  and  generally  downward  from  Group 
VII  to  the  end. 

In  the  third  place,  the  S series  does  not  show  such  extreme 
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fluctuations  of  the  group  means,  nor  generally  such  high  absolute 
values  of  these  means,  as  does  the  D group.  In  the  fourth  place, 
there  is  apparently  a curious  suggestion  of  a rough  parallelism  in  the 
courses  of  the  lines  of  means  for  the  D and  S series.  Probably  not  a 
few  non-statistically  trained  experimental  investigators  would  be  apt 
to  say,  if  they  performed  a series  of  10  experiments  and  got  results 
like  those  shown  in  the  D series,  and  then  repeated  the  series  and 
got  results  like  those  shown  in  the  S series,  that  the  second  series 
confirmed  the  first.  So  it  does  in  respect  of  everything  except  the 
apparent  trends  in  the  D series,  in  respect  of  which  the  parallelism 
is  wholly  illusory.  The  case  well  illustrates  how  easy  it  is  to  be 
deceived  by  the  general  impression  of  parallelism  of  two  lines  known 
each  to  be  subject  to  chance  fluctuations.  As  a matter  of  fact 
if  one  counts  the  cases  in  Fig.  67  in  which,  between  two  consecutive 
points,  the  lines  have  slopes  in  the  same  direction,  and  the  cases  in 
which  the  slopes  are  in  opposite  directions,  it  is  found  that  in  four 
out  of  the  nine  possible  cases  (I-II,  II— III,  VI-VII,  and  IX-X)  the 
D and  S lines  have  opposite  slopes,  against  five  with  similar  slopes. 

A conventional  measure  of  the  reliability  of  results,  or  put  the 
other  way  about,  of  their  “scatter”  due  to  the  chance  effects  of 
sampling,  is  used  by  statisticians  and  called  the  “probable  error.” 
It  is  a constant  so  chosen  that  when  its  value  is  added  to  and 
subtracted  from  the  result  obtained,  or  the  numeric  conclusion 
reached,  it  is  exactly  an  even  chance  that  the  true  result  or  conclu- 
sion lies  either  inside  or  outside  the  limits  set  by  the  probable  error 
in  the  plus  and  minus  direction.  For  example,  if  it  is  stated  that 
the  mean  age  at  death  of  persons  dying  in  Baltimore  is  39.83  =*=  2.60 
years,  it  means  that  the  mathematical  probability  that  the  true 
average  age  falls  between  37.23  years  (39.83  — 2.60)  and  42.43 
years  (39.83  + 2.60)  is  exactly  equal  to  the  mathematical  proba- 
bility that  the  true  age  falls  outside  those  limits. 

The  significance  of  any  result  is  to  be  judged  by  its  relation  to 
its  probable  error.  A simple  theorem  in  probability  tells  us  that 
the  probable  error  of  the  difference  between  any  two  independent 
quantities  ( i . e.,  quantities  such  that  there  is  no  correlation  between 
their  errors)  is  equal  to  the  square  root  of  the  sum  of  the  squares 
of  the  probable  errors  of  the  quantities  entering  into  the  difference. 
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It  will  be  perceived  then  that  the  probable  error  of  a difference  will 
necessarily  be  larger  than  either  of  the  two  probable  errors  entering 
into  its  determination.  Every  student  of  elementary  geometry 
knows  that  the  hypothenuse  of  a right  triangle  is  longer  than 
either  of  the  other  sides.  The  square  of  the  hypothenuse  is  equal 
to  the  sum  of  the  squares  of  the  other  two  sides,  just  as  the  square 
of  the  probable  error  of  a difference  is  equal  to  the  sum  of  the  squares 
of  the  probable  errors  of  the  two  quantities  entering  into  the  dif- 
ference. It  should  be  particularly  noted  by  the  student  that  this 
expression  for  the  probable  error  of  a difference  is  true  only  under 
the  particular  condition  stated  above,  as  to  absence  of  correlation 
of  errors.  The  general  formula,  true  in  all  cases,  is  given  in  the 
third  line  of  Table  61,  p.  361. 

As  an  example  of  the  use  of  the  probable  error  of  a difference, 
suppose  that  a physician  found,  after  administering  a standard 
dose  of  a drug  to  a considerable  number,  say  150  people,  that  the 
pulse  rate  was  81.12  =t  .20  beats  per  minute,  while  the  normal 
condition  in  the  same  group  was  79.68  =±=  .15  beats  per  minute. 
Would  he  be  justified  in  concluding  that  the  drug  significantly 
increased  the  heart  rate,  or  is  the  apparent  increase  simply  a 
result  of  chance,  arising  from  sampling?  We  have  the  following 
very  simple  calculation: 

Difference  = 81.12  - 79.68  = 1.44, 

(,20)2  + (.  15)2  = .0400  + .0225  = .0625, 

V .0625  = .25 

Or  we  see  that  the  difference  in  the  two  cases  is  1.44  =*=  .25.  The 
difference,  small  as  it  is  absolutely,  is  approximately  six  times  its 
probable  error.  Is  a difference  six  times  its  probable  error  likely 
to  arise  from  chance  alone,  or  does  it  represent  a really  significant 
difference? 

There  has  grown  up  a certain  conventional  way  of  interpreting 
probable  errors,  which  is  accepted  by  many  workers.  It  has  been 
practically  a universal  custom  among  biometric  workers  to  say 
that  a difference  (or  a constant)  which  is  smaller  than  twice  its 
probable  error  is  probably  not  significant,  whereas  a difference 
(or  constant)  which  is  three  or  more  times  its  probable  error  is 
either  “certainly,”  or  at  least  “almost  certainly,”  significant. 
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Now  such  statements  as  these  derive  whatever  meaning  they 
may  possibly  have  from  the  following  simple  mathematical  con- 
siderations. Assuming  that  the  errors  of  random  sampling  are 
distributed  strictly  in  accordance  with  the  normal  or  Gaussian 
curve,  which  will  be  discussed  in  some  detail  in  the  next  chapter, 
it  is  a simple  matter  to  determine  from  any  table  of  the  probability 
integral  the  precise  portion  of  the  area  of  a normal  curve  lying 
outside  any  original  abscissal  limits,  or,  in  other  words,  the  proba- 
bility of  the  occurrence  of  a deviation  as  great  as  or  greater  than 
the  assigned  deviation.  To  say  that  a deviation  as  great  or  greater 
than  three  times  the  probable  error  is  “certainly  significant” 
means,  strictly  speaking,  that  the  area  of  the  normal  curve  beyond 
3 P.  E.  on  either  side  of  the  central  ordinate  is  negligibly  small. 
As  a matter  of  fact  this  is  not  true,  unless  one  chooses  to  regard 
4.3  per  cent,  as  a negligible  fraction  of  a quantity.  There  are 
certainly  many  common  affairs  of  life  in  which  it  would  mean  dis- 
aster to  “neglect”  a deviation  of  4 per  cent,  of  the  total  quantity 
involved. 

In  order  that  a more  adequate  conception  may  be  had  of  just 
what  the  probable  error,  and  various  multiples  of  it,  mean,  Figs. 
68  to  71  are  inserted  here.  They  show  the  areas  of  the  normal 
curve  inside  and  outside  certain  limits. 

From  these  diagrams  one  may  perceive  exactly  what  is  meant 
when  he  says,  for  example,  that  a difference  which  is  three  times  its 
probable  error  is  certainly  significant.  He  means  that  the  sum  of 
the  two  cross-hatched  areas  in  Fig.  70  is  a wholly  negligible  quan- 
tity in  comparison  with  the  blank  area  under  the  curve  in  the  same 
figure.  Everyone  will  agree,  after  looking  at  Fig.  71,  that  a conclu- 
sion based  upon  a difference  four  or  more  times  its  probable  error  is 
practically  safe,  so  far  as  concerns  purely  statistical  considerations. 

Table  A of  Appendix  III  (p.  438)  sets  forth,  for  a series  of  ratios 
between  a statistical  deviation  and  the  “probable  error”  of  the 
distribution,  first,  the  probability  that  a deviation  as  great  as  or 
greater  than  the  given  one  will  occur,  and  second,  the  odds  against 
the  occurrence  of  such  a deviation.  The  probabilities  are  expressed 
on  a percentage  basis,  on  the  ground  that  they  will  probably  in  this 
way  make  a more  direct  appeal  to  the  average  mind,  since  we  are 
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more  accustomed  to  thinking  in  terms  of  parts  per  100  than  per 
any  other  number.  A single  example  will  indicate  how  the  table 
is  to  be  used.  Suppose  one  has  determined  the  mean  of  each  of  two 


Quartite  QuartHe 
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Fig.  68. — The  area  of  a normal  curve  inside  (blank)  and  the  area  outside  (cross- 
hatched)  the  lower  and  upper  quartiles.  The  quartiles  are  the  points  on  the  abscissa 
where  perpendiculars  to  the  base  cut  off  just  one-quarter  of  the  total  area  of  the 
curve  at  each  end.  By  definition  of  the  probable  error  given  above,  it  is  seen  that 
the  quartile  distance  on  the  x axis  is  1 X P.  E.  The  sum  of  the  two  cross-hatched 
areas  is  exactly  equal  to  the  blank  area  in  the  center.  ^ 


comparable  series  of  measurements.  These  means,  which  may  be 
called  A and  B,  differ  by  a certain  amount.  The  difference  is  found 


2x  RE.  2x  P.  E. 

Fig.  69. — The  area  of  a normal  curve  inside  (blank)  and  outside  (cross-hatched)  the 

limits  set  by  twice  the  probable  error. 


to  be,  let  us  say,  3.2  times  as  large  as  the  probable  error  of  the 
difference.  Is  one  mean  significantly  larger  than  the  other?  Or, 
put  in  another  way,  what  is  the  probability  that  the  difference  arose 
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purely  as  a result  of  random  sampling  (as  a result  solely  of  chance)  ? 
Under  the  argument  3.2  in  the  table  we  find  the  probability  of  the 
occurrence  of  a deviation  as  great  or  greater  than  this  to  be  3.09. 
This  means  that  if,  in  the  general  population  from  which  our 


Fig.  70. — The  area  of  a normal  curve  inside  (blank)  and  outside  (cross-hatched)  the 

limits  set  by  three  times  the  probable  error. 


samples  are  drawn,  the  means  A7  and  B'  were  truly  and  absolutely 
identical,  and  we  drew  successively  100  pairs  of  samples  of  the  size 
which  led  to  the  two  observed  means,  and  took  the  difference  be- 
tween the  averages  in  the  case  of  each  of  the  100  pairs,  there  would 


Fig.  71. — The  area  of  a normal  curve  inside  (blank)  and  outside  (cross-hatched)  the 

limits  set  b y four  times  the  probable  error. 


be  about  3 cases  in  the  100  trials  in  which  the  difference  would  be 
as  great  as  or  greater  than  that  actually  found  between  the  two 
observed  means  A and  B with  which  we  started  this  discussion. 
Or,  from  the  next  column,  the  odds  against  the  occurrence  of  a 
difference  as  great  or  greater  than  this  in  proportion  to  its  probable 
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error,  are  31.36  to  1,  if  chance  alone  were  operative  in  the  deter- 
mination of  the  event.  If  one  wants  to  call  this  “certainty”  he 
has  a perfect  right  to  do  so.  The  table  merely  defines  quantitatively 
his  particular  conception  of  certainty. 

It  will  be  noted  that  after  the  ratio,  deviation  -f-  P.  E.,  passes 
3.0  the  odds  against  the  deviation  increase  rapidly,  reaching  a 
magnitude  at  8.0,  which  is,  practically  speaking,  beyond  any  real 
power  of  conception.  We  have  started  the  table  at  1.0  because 
this  is  the  point  where  the  chances  are  even.  A deviation  as  large 
as  the  probable  error  is  as  likely  to  occur  as  not. 

From  this  table  it  is  seen  that  a deviation  of  four  times  the 
probable  error  will  arise  by  chance  less  often  than  once  in  a hundred 
trials.  When  one  gets  a difference  as  great  or  greater  than  this 
he  may  conclude  with  reasonable  certainty  that  it  did  not  arise 
by  chance  alone,  but  may  have  significant  meaning. 
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CHAPTER  XI 


ELEMENTARY  THEORY  OF  PROBABILITY 
THE  TOSS  OF  A PENNY 

The  tossing  of  a coin  is  a classical  event  in  the  discussion  of 
probability.  Let  us  examine  somewhat  carefully  what  this  event 
consists  of  and  involves.  Consider  first  the  penny.  It  is  a simple 
mechanism,  but  possesses  two  important  structural  characteristics. 
These  are: 

1.  It  is  thin.  By  this  we  mean,  more  precisely,  that  it  is  a right 
cylinder,  having  its  height  very  small  as  compared  with  its  diameter. 

2.  The  two  ends  of  the  cylinder  which  we  call  a penny  are  so 
marked  as  to  be  distinguishable  from  one  another.  One  of  these 
ends  is  called  the  head,  the  other  the  tail. 

Now  the  general  experience  of  mankind  with  structures  like  a 
penny,  that  is,  with  exceedingly  short  cylinders,  is  that  only  in  one 
or  the  other  of  two  positions  are  they  in  stable  equilibrium.  These 
positions  are  respectively,  standing  on  the  head  end  or  standing 
on  the  tail  end.  Everyone  knows  that  a penny  on  its  edge  (which 
is  of  course  the  side  of  the  cylinder)  is  in  a highly  unstable  position, 
so  much  so  in  point  of  fact  that,  except  by  an  excess  of  precaution 
which  would  physically  be  exceedingly  difficult  and  expensive  of 
attainment,  a penny  will  not  stand  free  of  support  on  its  edge  for 
more  than  an  extremely  short  time.  Why  everyone  knows  this 
is  simply  and  solely  because  he  has  tried  it.  That  is,  his  personal 
and  racial  experience  with  machines  or  structures  like  pennies, 
and  this  experience  alone , has  taught  him  that  they  will  not  stand 
on  edge.  No  amount  of  a priori  reasoning,  in  the  complete  absence 
of  experience,  could  safely  lead  to  this  conclusion. 

Since  pennies  then  always  do  come  to  rest  with  either  head  or 
tail  uppermost  following  any  disturbance  of  their  previous  state  of 
rest,  we  are  led  to  a further  question.  Is  there  anything  in  the 
structure  of  the  penny  which  makes  it  any  more  easy  for  it  to  come 
to  rest  after  a disturbance  of  its  prior  state  of  equilibrium  on  its 
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head  end  than  on  its  tail  end,  or  vice  versa?  Again  we  call  upon  our 
general  experience  of  machines  and  structures.  Plainly  that  experi- 
ence gives  us  no  warrant  for  believing  that  the  slight  differences  in  the 
pattern  of  the  two  ends  of  a penny  do,  in  fact,  sensibly  favor  either 
the  head  or  the  tail  position  of  equilibrium  in  any  particular  case. 

We  have  now  gained  two  important  results,  both  based  upon 
general  experience,  personal  and  racial.  They  are  that  when  a 
structure  like  a penny  comes  to  rest  after  a disturbance,  the  structure 
itself  determines  that  there  are  only  two  possible  positions  of  stable 
equilibrium,  and  that  there  is  nothing  in  the  structure  itself  which 
makes  one  of  these  any  easier  of  attainment  than  the  other. 

So  much  for  the  structure  of  the  penny.  Now  for  its  tossing. 
Tossing  can  be  interpreted  as  any  disturbance  of  a prior  state  of 
equilibrium.  Is  there  anything  in  the  tossing  which  makes  it  easier 
for  the  penny  to  come  to  rest,  when  it  does  so  come,  with  one  end 
rather  than  the  other  uppermost?  Plainly  this  depends  upon  how 
the  tossing  is  done.  Suppose  a penny  to  be  sitting  on  its  tail  end 
(that  is,  head  up)  on  the  desk  before  me.  If  I carefully  grasp  two 
opposite  points  of  its  periphery  between  my  thumb  and  forefinger 
and  raise  it  just  one  millimeter  from  the  table,  and  then  let  go,  it 
will  again  come  to  rest  with  head  up.  I can  repeat  this  performance 
industriously  forever,  and  it  will  always  come  to  rest  head  up. 
The  same  result  will  happen  if  I raise  it  just  two  millimeters  before 
I let  go.  How  do  I know  this?  From  past  experience  of  falling 
bodies  in  air,  and  in  particular  from  experience  of  excessively  short 
cylinders  falling  distances  less  than  their  diameter  in  air.  So  then 
we  see  that  it  is  possible  to  disturb  the  stable  equilibrium  of  a 
penny  at  rest,  and  have  it  always  return  to  the  same  position  of 
rest.  Equally  it  is  possible  so  to  disturb  the  penny  that  it  will 
always  return  to  the  opposite  position  of  equilibrium  to  that  which 
it  had  before.  I have  only  to  give  it  a sufficiently  strong  flip  at 
the  start  of  a fall  through  a distance  a little  more  than  its  own 
diameter  to  turn  it  over  just  once  in  the  course  of  that  fall. 

But  now  suppose  I drop  the  penny  from  a much  greater  height 
than  those  we  have  spoken  about;  or  literally  toss  it,  that  is,  pick  it 
up  from  the  table  and  throw  it  into  the  air;  or  set  it  spinning  like 
a top  on  its  edge;  or  roll  it  across  the  table  or  floor  on  its  edge. 
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Then  I have  fundamentally  altered  the  situation.  No  longer  have 
I disturbed  the  equilibrium  in  such  a way  as  to  make  it  easier  for 
the  penny  to  come  to  rest  on  one  of  its  ends  rather  than  the  other, 
as  was  the  case  in  the  examples  discussed  in  the  previous  paragraph. 
On  the  contrary,  by  these  operations  of  tossing  described  in  this 
present  paragraph,  I have  in  each  case  lost  control  of  the  future 
movements  of  the  penny  as  soon  as  it  leaves  my  hand.  An  indefi- 
nitely large  number  of  circumstances  can  influence  its  course  before 
it  comes  to  rest.  But  since  I cannot  control  these  circumstances, 
I call  them  random.  So  long  as  I could  control  the  circumstances 
I could  predict  with  positiveness  and  certainty  the  final  position 
of  rest  of  the  penny,  knowing  what  I did  about  its  structure.  Still 
knowing  just  as  much  as  before  about  the  structure  of  the  penny, 
and  it  being  just  as  fixed  and  determinate  as  before,  I have  lost  my 
power  of  prediction  because  I have  introduced,  in  the  tossing,  and 
only  in  the  tossing , an  element  of  randomness. 

What  do  we  mean  by  randomness?  Only  this,  that  a penny 
tossed  at  random  is  one  tossed  in  such  a way  that  the  attainment  of 
one  of  the  possible  states  of  equilibrium  is  not  more  favored  than  the 
other  in  or  by  the  act  of  tossing.  Therefore,  since,  as  we  have  seen, 
the  structure  of  the  penny  does  not  favor  one  position  of  rest  more 
than  the  other,  and  the  method  of  tossing  does  not  favor  one  more 
than  the  other,  there  is  nothing  so  far  to  enable  us  to  assert,  on  the 
basis  of  what  is  known  by  experience,  that  the  penny  will  more 
often  come  to  rest  on  one  end  than  on  the  other  end. 

Can  we  then  assert  the  opposite,  namely,  that  the  penny  will , 
under  the  conditions  of  structure  and  tossing  named,  come  to  rest 
with  the  head  end  uppermost  as  often  as  with  the  tail  end  upper- 
most? Here  we  come  to  a sharp  division  of  opinion  among  students 
of  the  foundation  of  the  theory  of  probability.  There  are  those 
who  maintain  that  solely  on  the  basis  of  experience  with  structures 
like  pennies  and  random  tossing,  or  even  without  experience  by 
pure  induction  from  the  structure  of  a penny  and  from  the  abstract 
idea  of  randomness,  we  are  able  by  a priori  reasoning  to  assert  that 
the  penny  tossed  at  random  will  come  to  rest  as  often  with  head 
uppermost  as  with  tail.  These  persons,  in  short,  assert  that  fun- 
damentally our  notions  of  probability  are  purely  a priori. 
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But  this  view  overlooks,  as  it  seems  to  me,  a most  important 
consideration.  How  can  one  know  that  the  only  things  concerned 
in  determining  which  of  the  alternative  positions  of  equilibrium  of 
a penny  shall  eventuate,  are  things  related  solely  to  the  structure 
of  the  penny  and  the  randomness  of  the  tossing?  Plainly  he  cannot 
know  a priori.  In  fact  this  is  one  of  the  most  important  things  he 
wants  to  find  out  in  a research  on  penny  tossing.  A priori  one 
could  not  possibly  assert  that  there  might  not  be  some  wholly 
unknown  and  unperceived  cosmic  principle  influencing  the  coming 
to  rest  of  pennies.  At  not  so  remote  times  in  the  history  of  human 
thought  it  might  easily  have  been  solemnly  asserted  that  a demon, 
or  some  other  supernatural  agent,  interested  himself  in  penny 
tossing. 

And  today  the  only  way  to  prove  that  a demon  is  not  involved 
in  the  affair  is  to  try  the  case.  Now  what  is  found  when  one  tries 
it,  by  tossing  a normal  penny  a great  many  times  in  a random  way, 
is  that  in  fact  the  penny  comes  to  rest  in  the  long  run  just  about 
as  many  times,  and  no  more,  with  head  uppermost  as  with  tail 
uppermost.  But  this  is  just  what  would  be  expected  if  the  only 
things  concerned  were  the  structure  of  the  penny  and  the  randomness 
of  the  tossing.  Hence  it  may  reasonably  be  concluded,  on  the  basis 
of  this  experience , and  on  this  basis  alone,  that  there  are  no  super- 
natural agencies  involved,  and  that  in  these  two  factors  of  structure 
and  randomness  we  have  the  sole  essential  elements. 

By  this  long  argument  I hope  it  has  been  made  clear  that  the 
only  basis  we  have  for  saying  that  when  a penny  is  tossed  at  random 
it  is  as  likely,  or  probable,  that  it  will  come  to  rest  with  the  head 
up  as  with  the  tail  up,  is  the  basis  of  experience.*  This  experience, 
summarized,  is  of  three  sorts: 

A.  Experience  of  machines  or  structures  like  pennies,  namely, 
cylinders  excessively  short  in  proportion  to  their  diameter. 
This  experience  teaches  that  such  structures  can  attain  a 
steady  state  of  rest  only  when  lying  on  one  end  or  the 
other,  namely,  with  either  head  up  or  tail  up. 

* The  student  will  find  the  same  point  of  view  which  has  been  developed  here  as 
to  the  experiential  basis  of  our  knowledge  of  probability  expressed  in  more  general 
terms  in  the  opening  chapter  of  Professor  Julian  L.  Coolidge’s  text-book.8 
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B.  Experience  of  random  tossing;  namely,  of  uncontrolled 

phenomena,  in  which  because  of  the  lack  of  control  one 
outcome  is  not  more  favored  than  another.  This  experi- 
ence teaches  that  after  a penny  is  randomly  tossed  the. 
tosser  has  lost  all  control  of  which  end  shall  be  uppermost, 
head  or  tail,  when  it  comes  to  rest. 

C.  Experience  of  tossing  pennies  many  times.  This  experience 

teaches  that  if  a true  penny  is  tossed  many  times  it  will 
come  to  rest  about  one-half  the  times  with  the  head  up, 
and  about  one-half  of  the  times  with  the  tail  up. 

THE  MATHEMATICS  OF  SIMPLE  PROBABILITY 

A penny  can  by  virtue  of  its  structure  come  to  rest  either  head 
up  or  tail  up.  Suppose  we  call  the  times  it  happens  the  first  of 
these  ways  a,  and  the  times  it  happens  the  second  b.  Therefore 
the  total  possible  times  it  can  come  to  rest  will  be  a -j-  b.  If  the 
penny  is  tossed  at  random  it  is  as  likely  to  fall  the  a way  (i.  e.,  H) 
as  the  b way  (i.  e.,  T).  In  any  one  toss  but  one  actual  occurrence 
can  happen  (namely,  the  penny  must  come  to  rest  on  an  end,  not 
on  the  edge),  though  there  are  two  possible  ways  in  which  the 
occurrence  can  happen  (namely,  it  may  come  to  rest  on  either  the 
H or  the  T end).  The  mathematical  measure  of  simple  probability 
is  taken  as  the  ratio  in  which  (I)  the  number  of  times  a particular 
specified  event  occurring  at  random  in  a class  of  events  either  has 
happened , or  by  inference  from  actual  experience  of  similar  events 
could  have  happened , is  to  (2)  the  whole  number  of  times  all  kinds  of 
events  possible  in  the  class  either  have  happened , or , by  inference  from 
experience  of  similar  events , could  have  happened. 

The  numerical  appreciation  or  determination  of  actual  occur- 
rences and  of  possible  ways  is,  and  must  always  be,  based  upon 
experience;  but  this  experience  may  be  of  either  of  two  sorts, 
namely,  general  experience  of  particular  structures  (as  in  the  case 
of  the  penny),  or  particular  statistical  experience  of  events.  But, 
however  the  numerical  determination  is  derived,  the  form  of  the 
probability  statement  remains  the  same,  a ratio  or  fraction;  and 
no  greater  validity  necessarily  or  absolutely  inheres  in  the  one 
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method  of  arriving  at  the  numerical  determination  than  in  the 
other,  so  far  as  the  resulting  probability  is  concerned. 

To  return  now  to  the  penny: 

The  probability  that  after  any  one  particular  random  toss  a 
penny  will  come  to  rest  with  the  head  end  up  is,  upon  the  reasoning 
given  above, 

a 

^ a + b 

In  any  one  particular  toss  of  one  penny  clearly  either 

a = 1, 
or  b = 1 

and  the  whole  number  of  possible  ways  in  which  the  event  can 
happen  is  1 + 1,  whence 


Similarly,  the  probability  that  after  any  one  particular  toss  it  will 
come  to  rest  with  the  tail  end  up  is 


* a + b 1 + 1 

P + q = i- 

These  results  tell  us  that  on  any  given  single  random  toss  of  one 
penny  it  is  an  even  or  equal  chance  (or  probability)  that  the  penny 
will  come  to  rest  with  head  up.  It  is  a certainty  (p  -f-  q = 1)  that 
it  will  come  to  rest  with  either  head  or  tail  up. 

Thus  in  the  numerical  expression  of  the  probability  of  resting 
with  head  up  after  one  random  toss,  the  numerator  of  the  fraction 
must  be  1 because  the  specifications  are  that  it  shall  be  head  up, 
and  not  otherwise.  The  denominator  must  be  2 because  the  whole 
number  of  possible  ways  is  either  head  or  tail  (=2). 

Suppose  the  penny  to  be  tossed  at  random  n times.  How  many 
times  out  of  the  n will  it  probably  come  to  rest  head  up  (H)? 

Plainly  pn,  because  one  toss  does  not  influence  the  next,  nor  the 
next,  nor  any  other  toss  whatever.  Therefore  the  number  of  H’s 
in  n trials  must  be  n times  the  probability  of  H on  one  trial,  which 
is  J,  as  we  have  seen. 

Now  suppose  we  are  dealing  not  with  a particular  structure 
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like  a penny,  but  a series  of  events  and  wish  to  know  the  probability 
of  occurrence  of  a particular  kind  of  event  in  this  series.  Following 
the  rule  that  the  probability  is  the  ratio  of  the  frequency  of  actual 
occurrences  of  the  specified  sort  to  the  total  number  of  possible 
ways,  we  count  in  the  statistical  experience  the  occurrences  of  the 
specified  kind  and  make  the  result  the  numerator  of  the  probability 
fraction,  and  count  the  total  number  of  all  occurrences  in  the 
universe  under  discussion  and  put  this  result  as  the  denominator. 

Example:  On  the  basis  of  the  experience  of  the  U.  S.  Birth 
Registration  Area  in  1919,  what  is  the  probability  that  any  indi- 
vidual baby  born  in  that  area  will  be  a male? 

In  1919  male  births  = 705,593  = a 

In  1919  total  births  = 1,373,438  = a + b 


Therefore  the  probability  that  a given  birth  would  be  male  is 

705,593 


P = 


= .5137 


1,373,438 

The  chance  that  a given  birth  would  be  a female  is 

q = 1 - p = 1 - .5137  = .4863 


Or  there  were  about  fifty-one  chances  in  a hundred  that  a given 
birth  would  be  of  a male. 

The  principles  stated  above  regarding  the  fraction  which 
measures  probability  may  be  extended  to  any  number  of  mutually 
exclusive  events  equally  capable  of  happening.  Thus 


_ a 

^ a b -\-  c 

- b 

cl  T b -f*  c -f- 

c 

r a + b -f-  c + 

etc. 

P + q + r + = 1 


Example:  What  is  the  probability  of  drawing  any  number  of 
just  three  figures  from  the  entire  list  of  numbers  which  can  be 
formed  from  the  first  seven  digits,  it  being  specified  that  any  digit 
can  be  used  but  once  in  forming  any  number? 
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The  number  of  different  three  figure  numbers  which  can  be 
formed  from  the  first  seven  digits  is 

210  = a 

The  whole  number  of  different  numbers  (of  1 digit,  2 digits,  etc.) 
which  can  be  listed  from  the  first  seven  digits  is 

13,699  = aJrb-\-c-\-d-{-e-\-f-\-g 


Therefore 

210  _ 1 

P ~ 13,699  ~ 65 

The  probability  of  drawing  any  one  particular  three  figure 
number,  say  123,  is 

l 

p ~ 13,699 


But  at  this  point  some  one  will  say:  How  do  you  know  that 
just  210  different  three  figure  numbers  can  be  made  up  from  the 
first  seven  digits?  Or  that  the  total  of  different  numbers  of  all 
sizes  from  these  seven  digits  is  just  13,699? 

To  answer  these  pertinent  questions  it  will  be  necessary  to  ask 
the  reader  to  review  briefly,  as  a digression  from  the  main  proba- 
bility argument,  which  under  all  the  circumstances  will  perhaps  be 
pardoned,  a small  portion  of  his  elementary  college  algebra,  which 
the  medical  man  has  perhaps  forgotten. 


PERMUTATIONS 

The  number  of  different  ways  in  which  the  three  letters  a , b, 
and  c can  be  arranged  (or  permuted)  in  groups  of  three  is  plainly 

a b c 
a c b 
b a c 
b c a 
cab 
cb  a 

These  six  different  arrangements  are  the  permutations  of  three 
things  taken  three  at  a time. 


296 


MEDICAL  BIOMETRY  AND  STATISTICS 


Generally  we  may  write 

nPr  = n (n  — 1)  (n  — 2)  (n  — 3) (n  — r + 1)  = 

which  means  that  the  number  of  permutations  of  n things  taken  r 
at  a time  (nPr)  is  equal  to  factorial  n,  (4?) , divided  by  factorial 
n minus  r,  (J  (n  — r)^  . 

From  this  it  will  be  perceived  that 

nP n — | W 


(n  — r ) 


which  in  the  case  of  our  three  letter  example  becomes 

3P3  = 3 X 2 x 1 = 6, 


just  precisely  the  result  we  got  experimentally. 

The  total  number  of  permutations  of  n things  taken  singly, 
by  twos,  by  threes,  etc.,  is  found  by  summing  nPr  for  all  values  of 
r from  1 to  n. 

Call  this  si  m 2 nPr. 


Then  it  can  be  proved  that 


J n J n~ 


2 nPr  — | n_  -j-  '-j—  + ^ + — ■ -k  - + 


| n 

1 .2.3 


+ 


n 


(n  — 1) 


= n 


1 


1 + 4 + l4  + t4i  + + rv^7). 


It  can  further  be  shown  that  the  series  in  the  parenthesis 
approximates  more  and  more  closely  in  value  the  longer  it  is,  to  a 
number  conventionally  called  e,  which  is  the  base  of  the  Napierian 
system  of  logarithms,  and  has  the  value 

e = 2.7182818  


Hence  it  follows  that  for  large  values  of  n 

2 nPr  = e | n approximately. 

The  question  at  once  arises:  How  large  does  n have  to  be  to 
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make  this  approximation  close  enough  for  practical  statistical 
purposes?  The  answer  can  be  given  by  an  example. 


When  n = 9,  obviously  not  an  excessively  large  number, 
2 nPr  = 986,410,  by  the  e | n approximation, 

2 nPr  = 986,409,  exactly. 


For  the  convenience  of  the  reader  a brief  table  of  permutations 
and  their  sums  is  given  as  Table  44. 


TABLE  44 

Values  of  Permutations 
Permutations  of 


\» 

'\ 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

1.. 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

2.. 

90 

72 

56 

42 

30 

20 

12 

6 

2 

3.. 

720 

504 

336 

210 

120 

60 

24 

6 

g 

4.. 

5,040 

3,024 

1,680 

840 

360 

120 

24 

5.. 

30,240 

15,120 

6,720 

2520 

720 

120 

0 

6.. 

151,200 

60,480 

20,160 

5040 

720 

7.. 

604,800 

181,440 

40,320 

5040 

8.. 

1,814,400 

362,880 

40,320 

9.. 

3,628,800 

362,880 

10.. 

3,628,800 

*nPr 

9,864,100 

986,409 

109,600 

13,699 

1956 

325 

64 

15 

4 

1 

COMBINATIONS 

How  many  different  combinations  of  three  letters  each  can  be 
made  from  the  four  letters  a , b,  c,  and  d?  This  is  not  the  same 
problem  as  before.  Now  each  combination  of  three  letters  must 
be  different,  not  in  respect  of  the  order  of  the  letters,  but  merely 
of  the  letters  themselves.  Thus  only  one  of  the  combinations 
abc  and  cab  can  be  used,  because  each  contains  the  same  letters, 
a,  b,  and  c. 

Writing  down  the  possibilities  we  get 

abc 
a b d 
a c d 
bed 

No  other  combination  can  be  written  which  will  not  contain, 
in  some  arrangement,  the  same  three  letters  that  are  in  one  or 
another  of  the  four  groups  above. 
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Using  a similar  notation  to  that  of  permutations  we  have 

j n 

n^r  I r | (n  — r) 

which  tells  us  how  to  find  the  number  of  different  combinations  of 

n things  taken  r at  a time.  The  example  of  the  letters  becomes, 

4 X 3 X 2 X 1 24 

43  ' (3  X 2 X 1)  X (1)  6 ’ 

which  again  coincides  with  the  experimental  result.  In  passing 
it  may  be  noted  that  if  r be  put  equal  to  n we  have 

| n 

nCn  = |bT  = 1 


which  again  is  reasonable,  since  obviously  only  one  combination  of 
n things  taken  all  together  can  possibly  be  made. 

For  the  sum  of  combinations,  that  is,  the  total  combinations  of 
n things  taken  singly,  by  twos,  etc.,  we  have 


2 nC  r — U T 


n.  (n  — 1)  n.  ( n — 1)  (n  — 2) 


1.2 


+ 


1.2.3 


+ 


+ n + 1 


But  the  right-hand  side  of  the  equation,  as  will  appear  from  the 
discussion  of  the  binomial  theorem  in  a later  section,  is 


Hence 


(1  + 1)*  - 1. 

2 nCr  = 2”  - 1. 


Again,  for  the  sake  of  convenience,  a brief  table  of  combinations 
is  inserted  as  Table  45. 


TABLE  45 

Values  of  Combinations 


Combinations  of 


r ^ 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

1 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

2. .... . 

45 

36 

28 

21 

15 

10 

6 

3 

1 

. 

3 

120 

84 

56 

35 

20 

10 

4 

1 

g 

4 

210 

126 

70 

35 

15 

5 

1 

5 

252 

126 

56 

21 

6 

1 

>3 

6 

210 

84 

28 

7 

1 

7 

120 

36 

8 

1 

8 

45 

9 

1 

9. 

10 

1 

10 

1 

2 nCr 

1023 

511 

255 

127 

63 

31 

15 

7 

3 

1 
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It  will  be  noted  from  this  table  that  in  each  column  the  values 
rise  to  a maximum  and  then  decline. 


The  maximum  nCj  = 


| n 


n 

T 


when  n is  even. 


The  maximum  nCr  = 


In 


n + 1 

n — 1 

2 

2 

when  n is  odd. 


Approximations  to 


n 


«• 


In  all  practical  work  with  probability  it  is  useful  to  have  an 
easily  computed  approximation  to  the  value  of  \n_  in  cases  when 
n is  large.  In  Pearson’s  “Tables  for  Statisticians  and  Biometri- 
cians” and  also  in  Glover’s  “Tables  of  Applied  Mathematics”  a 
table  of  log  | n_  for  n — 1 to  1000  is  given.  But  for  still  higher 
values  an  approximation  is  needed.  A number  of  such  formulae 
are  available. 

Stirling’s: 

l»=V2'»X»»r.X  {X  + Wn  + vL*  + ) 


Forsyth’s: 


»2  + » + I 

e 


n + \ 


This  is  accurate  to 


To  indicate  the  closeness  of  such  approximations  we  may 
calculate  [w_  for  n = 2. 

The  result  is 

| n — 1 .999479  (Forsyth) 


Actual  error  = 
1 

240«3 


.000521 

.00052083 


Hence  it  may  be  concluded  that  for  all  practical  purposes 
Forsyth’s  approximation  is  sufficiently  accurate.  It  is,  in  the 
opinion  probably  of  most  computers,  somewhat  easier  and  quicker 
of  calculation  than  Stirling’s  approximation. 
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THE  PROBABILITY  OF  CONCURRENT  EVENTS 

Suppose  this  question  is  put:  If  two  pennies  are  tossed  at 
random  together,  what  is  the  probability  that  both  will  show  heads 
when  they  come  to  rest? 

What  are  the  possibilities?  Let  us  call  one  of  the  pennies  A 
to  distinguish  it  from  the  other  B.  Then  we  have,  as  possibilities, 

AH,  BH 
AH,  BT 
AT,  BH 
AT,  BT. 

• From  this  it  appears  that  the  favorable  event  AH,  BH,  can 
occur  in  but  one  way,  out  of  a total  of  four  ways  in  which  any  event 
may  happen  under  the  specifications  (namely,  of  two  pennies  tossed 
together) . Hence 

P = i 

The  probability  that  the  pennies  will  fall  one  head  and  one  tail 
is  evidently, 

P = I- 

Now  let  us  consider  these  results  analytically.  Any  one  throw 
of  the  two  pennies  must  necessarily  result  in  a combination  of  the 
character  A — , B — , where  the  dashes  may  be  either  H or  T. 
But  considering  the  A penny  alone,  the  probability  that  it  will  be 
AH  after  any  particular  toss  is,  as  we  have  already  seen,  This 
means  that  in  n successive  tosses  of  the  A penny  alone  it  will  come 
AH  approximately  one-half  of  the  times  and  AT  one-half  of  the 
times.  This  fact  will  not  be  altered  by  virtue  of  the  fact  that  B is 
tossed  with  A,  because  if  the  tossing  is  random  neither  penny  affects 
the  other.  Consequently  it  must  happen  that  in  about  one-half  of 
n tosses  of  the  two  together  the  constitution  of  the  result  must  be 
of  the  form  AH,  B — , or  numerically  the  result  will  be  | n AH, 
B — . But  now  the  B penny,  which  is  associated  with  each  of  these 
AH  pennies  in  the  \ n throws,  will  be  subject  to  the  same  influences 
as  though  it  were  tossed  alone.  Consequently  we  shall  have  in 
these  \ n tosses  these  results: 

\ (f  n)  AH,  BH,  and 
h G n)  AH,  BT. 

But  h G n)  AH,  BH  - \ n AH,  BH. 
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Continuing,  let  us  consider  next  the  one-half  of  the  n tosses  in 
which  the  A penny  falls  T.  By  the  same  reasoning  as  before,  we 
shall  get 

h (h  n)  AT,  BH,  and 
I C n)  AT,  BT. 

But  the  j n AH,  BT,  and  J n AT,  BH  clearly  must  be  added 
together,  since  they  are  the  cases  in  which  head  and  tail  occur 
together,  and  it  makes  no  difference  which  penny  is  head  or  which 
tail,  so  that  we  have  for  the  probability  of  the  two  pennies  falling 
one  head  and  one  tail, 

1 n /AH,  BT  or 

2 n \AT,  BH. 

So,  then,  the  complete  result  is, 

| n AH,  BH  = 2 heads, 
r AH,  BT  ] 

\ n \ or  f = 1 head,  1 tail, 

[ AT,  BH  J 

\ n AT,  BT  = 2 tails. 

Whence  we  arrive  at  the  rule: 

If  the  separate  probabilities  of  each  of  several  independent  events 
are  respectively  pi,  p2,  pz  the  probability  of  their  all  occur- 

ring together  is 

P = pi  X p2  X pz 

The  concurrence  of  events  implied  in  this  rule  and  the  discussion 
which  has  led  up  to  it  may  be  either  in  time,  or  in  space  but  not  in 
time,  or  in  both  space  and  time.  Thus  in  the  case  of  tossing  two 
pennies  together,  the  probability  of  \ that  they  will  fall  HH  would 
plainly  not  be  affected  in  any  way  if  one  of  the  pennies  were  tossed 
say  a fraction  of  a second  later  than  the  other,  nor,  indeed,  if  it 
were  tossed  several  seconds,  or  minutes,  or  days,  or  any  other 
time  unit,  later,  provided,  as  always,  that  all  the  tossing  was  random 
in  character.  Hence  it  is  seen  that  the  probability  of  HH  with  two 
pennies  is  the  same,  J,  whether  they  are  tossed  together  or  suc- 
cessively. 

The  simple  theorems  in  probability  so  far  developed  have  many 
practical  applications  in  medical  work.  An  example  from  actual 
experience  may  be  given  in  illustration: 
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A physician  has  seen  in  the  whole  of  his  lifetime’s  practice 
23,464  patients.  Of  these  patients,  1474  had  some  disease  of  the 
gall-bladder  or  ducts.  Also  of  the  same  23,464  patients  454  had 
glycosuria  from  some  cause  or  other.  Of  the  454  patients  ex- 
hibiting glycosuria,  372  were  cases  of  diabetes  mellitus.  Now  in 
the  whole  experience  24  patients  exhibited  both  disease  of  the  gall- 
bladder and  glycosuria,  and  13  had  both  gall-bladder  disease  and 
diabetes  mellitus. 

The  question  now  is:  Were  gall-bladder  disease  and  glycosuria 
more  or  less  often  associated  together  in  this  series  than  would  be 
expected  if  chance  or  random  association  were  the  only  influence 
bringing  them  together? 

In  the  experience  of  this  physician  the  probability  that  a patient 
had  disease  of  the  gall-bladder  and  ducts  was 


1474 

pl  ~ 23,464  ” 


.0628 


The  probability  that  a patient  had  glycosuria  was 


454 

p2  ~ 23,464 


.0193 


The  probability  of  a patient  having  both  gall-bladder  disease 
and  glycosuria  was 

P = p,X  p2  = .0628  X .0193  = .001212 

There  would  then  be  expected,  from  random  assortment  of 
diseases  alone,  in  this  series  a total  of 

23,464  X .001212  = 28.4 

patients  showing  both  these  morbid  conditions.  Actually  there 
were  24  such  patients.  Whence  we  may  at  once  conclude  that  the 
association  of  the  gall-bladder  disease  and  glycosuria  observed 
in  this  series  of  23,464  patients  was  approximately  what  might 
have  been  expected  from  the  operation  of  chance  alone. 

The  case  for  diabetes  mellitus  and  gall-bladder  disease  is  some- 
what different. 
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Here 


pi  = .0628  as  before 


, _ 372 

' 2 ~ 23,464  “ 

P = pi  X p\  = 


0159 

.000999 


and  the  number  of  cases  expected  is 

23,464  X 000999  = 23  4 


while  actually  only  13  occurred  with  the  combination.  Hence  it 
may  be  concluded  that  in  this  series  diabetes  mellitus  and  diseases 
of  the  gall-bladder  and  ducts  actually  occurred  together  in  the  same 
patients  only  slightly  more  than  half  as  often  as  they  would  be 
expected  to  from  chance  alone. 

THE  POINT  BINOMIAL 

Let  us  now  consider  what  will  happen  in  n trials  regarding  an 
event  for  which  the  probability  of  occurrence  is  p,  and  the  proba- 
bility of  failure  is  q = 1 — p. 

1.  The  probability  that  the  event  will  occur  at  every  trial  is 
evidently 

p X p X p X p . . . = Pn 

Thus  if  we  toss  together  at  random  four  pennies  the  probability 
that  they  will  fall  all  heads,  HHHH,  is 

1 y 1 y i v i — — -i_ 

2 X\  2 /n  2 2 — \ 2 / — 16 


2.  The  probability  that  in  any  one  throw  n — 1 particular 
pennies  will  give  successes  (say  heads)  and  one  particular  penny  a 
failure  (tail)  is 

p X P X p X X q = pn~l-q 

But  this  result  can  occur  n different  ways,  as  is  plain  from  the 
four  pennies,  which  may  give  three  heads  and  one  tail,  as  follows: 

H H H T 
H H T H 
H T H H 
T H H H 
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Hence  the  complete  probability  that  the  event  will  occur 
n — 1 times  and  fail  once  is 


or  in  the  penny  case 


n pn  l.q 


4 T)3  (*)  = ^ 


3.  The  probability  that  in  any  one  throw  n — 2 particular 
pennies  will  give  successes  and  2 particular  pennies  failures  is 


p X P X x q X q = pn  2.q 2 


But  again  this  may  happen  in 


n (n  — 1) 
1~2 


nCr 


(remembering  that  in  the  formula  given  above  for  nCr  some  factors 
cancel  in  numerator  and  denominator). 


different  ways,  as  can  be  seen  from  the  example  of  tossing  four 
pennies,  where  the  combination  of  two  heads  and  two  tails  may 
occur  as  follows: 

H H T T 
HTHT 
HTTH 
T H H T 
THTH 
T T H H 


Hence  the  complete  probability  of  the  event  occurring  n 
times  and  failing  twice  is 


- 2 


n {n  — 1) 


1.2 


pn  — 2.q2, 


which  in  the  penny  example  is 

4.3 


(T2  (I)2  = I 


4.  And  so  the  same  process  may  be  continued.  But  enough 
detail  has  been  presented  to  make  it  evident  that: 

If  n trials  be  made  of  an  event  for  which  the  probability  of  occur- 
rence is  p and  the  probability  of  failure  is  q,  the  probability  of  each  of 
the  several  possible  occurrences  is  given  by  the  appropriate  term  in  the 
expansion  of  the  binomial 


( p + q)n • 
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5.  If  p = q = J,  as  in  the  case  of  the  penny,  the  point  binomial 
will  be  symmetric,  as  shown  in  Fig.  72,  which  gives  the  results  for 
the  four-penny  example. 

But  within  fairly  wide  limits  p and  q may  have  any  values. 
Thus  consider  the  results  of  throwing  four  dice  together.  In  the 


■5  4 heads  3 Heads  2 Heads  I Head  Ho  Heads  5 

Heads  No  Tail  I Tail  £ Tails  3 Tails  4-  Tails  7a u < 


Fig.  72. — The  results  of  tossing  four  pennies  together  at  random,  as  given  by  the 

binomial  (|  H~  I)4- 


case  of  dice  the  probability  of  any  particular  face  of  the  die  coming 
up  after  one  random  throw  of  one  die  is 

P = * 


whence 

q = f = the  probability  that  this  particular  face  will  not  come  up. 


Hence  for  the  probabilities  of  getting  different  numbers  of  6’s 
with  4 dice  thrown  together  at  random  we  require  the  successive 
terms  of 

(I  + I)4 


20 
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These  are: 

pn  = = probability  that  all  4 dice  will  fall  with  the  6 face  up. 

20 

npn-lq  = = probability  that  3 dice  will  fall  6’s  and  1 die  something  other  than  6 

n (n  — 1)  _2  2 _ 150  _ probability  that  2 dice  will  fall  6’s,  and  the  other  2 some- 

1.2  ^ 1296  thing  other  than  6. 

n (n  — 1)  (n  — 2)  _3  3 _ 500  _ probability  that  1 die  will  fall  6 and  the  other 

1.2.3  ^ 1296  three  something  else. 

n (n  — 1)  (»  — 2)  (n  — 3)  625 

^ ^ ^ pn-^q4  = = probability  that  no  die  will  fall  6. 


Fig.  73. — The  probability  of  getting  different  numbers  of  6’s  in  the  throws  of  4 dice 

together,  as  given  by  (£  + f)4. 

This  distribution  is  shown  graphically  in  Fig.  73,  and  its  asym- 
metry or  skewness  is  apparent. 

The  student  must  bear  always  in  mind  in  connection  with  the 
graphical  representations  of  the  point  binomial  in  this  section  and 
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elsewhere,  that  the  terms  of  the  binomial  are  true  ordinates , and  not 
frequency  areas.  Consequently  the  lines  connecting  the  circles 
to  form  a polygon  are  not  a correct  representation  of  actuality. 
Theoretically  the  circles  in  such  a diagram  as  Fig.  73  stand  alone 
by  themselves.  The  lines  are  put  in  simply  as  a convenience,  to 
enable  the  eye  to  get  the  sweep  of  the  ordinates  as  a whole. 

6.  The  probability  of  an  event  occurring  t or  more  times  in  n 
trials  is  the  sum  of  the  terms  of  (p  + q)n  from  pn  up  to  the  term 
in  pKq*-*. 

The  consequences  and  usefulness  of  this  proposition  are  far 
reaching  and  will  bear  careful  examination. 

Let  us  start  with  an  example.  Suppose  ten  pennies  to  be  tossed 
together  at  random.  For  the  results  we  have 

/x  , 1Nin  1 + 10  + 45  + 120  + 210  + 252  4-  210  + 120  + 45  + 10  +1 

(2  + 2)  - 1024 

a 

These  fractions  are  reduced  to  decimals  in  Table  46. 


TABLE  46 

Successive  Terms  of  + + 4)10 


Ordinal  number  of  term. 

Value  of  term. 

Term  measures  the  proba- 
bility that  there  will  be 
in  any  one  throw 

1 

.000977 

10  heads, 
9 heads, 
8 heads, 
7 heads, 
6 heads, 

0 tail 

2 

.009766 

1 tail 

3 

.043945 

2 tails 

4 

.117187 

3 tails 

5 

. 205078 

4 tails 

6 

. 246094 

5 heads, 
4 heads, 

5 tails 

7 

.205078 

6 tails 

8 

.117187 

3 heads, 

7 tails 

9 

.043945 

2 heads, 

8 tails 

10 

.009766 

1 head, 

9 tails 

11 

.000977 

0 head, 

10  tails 

Total 

1 . 000000 

There  is  then  about  one  chance  in  a thousand  that  on  any  one 
throw  the  10  pennies  will  all  fall  head.  There  is  approximately 
one  chance  in  four  that  there  will  be  5 heads  and  5 tails  on  any  one 
throw,  and  so  on. 

The  ordinates  of  Table  46  are  plotted  in  Fig.  74. 
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What,  now,  is  the  probability  that  on  any  one  throw  there  will 
fall  six  or  more  heads?  By  the  rule  given  above,  and  obviously 
from  general  principles  discussed  earlier,  this  probability  is: 


The  probability  for  6 heads  = 

+ The  probability  for  7 heads  = 

+ The  probability  for  8 heads  = 

+ The  probability  for  9 heads  = 

+ The  probability  for  10  heads  == 

Complete  probability  of  6 or  more  heads  on  one  throw  = 


.205078 
+ .117187 
+ .043945 
+ .009766 
+ .000977 

.376953 


<3  ,0 

33 

50 

0)- 

33 

■§3 

33 

3 

T3  SO 

"0  3 

■§5 

||S 

p 

||3 

3)5 

o s 
3)5 

Is 

<0^ 

'Q'O 

CD  t> 

(NiCO 

-O) 

o5 

Fig.  74. — The  binomial  (|  + |)10.  The  meaning  of  the  cross-hatched  area  is  explained 

in  the  text. 


Or,  it  appears  that  there  are  approximately  thirty-eight  chances 
in  one  hundred,  or  a little  more  than  one  in  three  of  throwing  6 or 
more  heads  at  one  toss  of  the  10  pennies.  In  the  diagram  the 
cross-hatched  portion  shows  the  ordinates  summed.  The  ratio  of 
the  area  of  the  cross-hatched  portion  to  the  total  area  is,  for  reasons 
which  will  appear  in  the  next  section,  approximately  that  of  the 
total  probability  of  .38  given  above. 
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7.  In  all  of  the  discussion  of  the  point  binomial  so  far  nothing 
has  been  said  specifically  about  abscissas.  The  discussion  has  been 
wholly  about  ordinates,  and  in  the  tables  and  diagrams  we  have 
simply  named  in  words  the  situation  relative  to  the  pennies  at 
each  point  at  which  an  ordinate  was  erected.  But  this  is  plainly 
not  a neat  or  complete  procedure.  It  is  time  now  to  see  if  some- 
thing different  cannot  be  done  relative  to  abscissas. 

Consider  the  symmetric  binomial,  where  p — q = The 
structure  resulting  from  its  expansion  is  a series  of  points,  which 
if  connected  by  lines  as  we  have  done,  form  a polygon,  shaped  like 
a Napoleonic  cocked  hat,  the  line  rising  from  each  end  to  a peak 
in  the  middle.  Now  suppose  instead  of  designating  each  abscissal 
point  at  which  an  ordinate  is  erected  by  a descriptive  term,  such 
as  a6  heads,  4 tails,”  we  measure  the  distance  of  each  such  point 
from  the  center  of  the  polygon  where  the  highest  ordinate  is  (or 
when  n is  odd,  from  a point  half-way  between  the  two  equal 
central  ordinates),  using  as  the  yardstick  for  the  measurement 
some  function  of  the  shape  of  the  curve,  or  of  the  spread  of  its  two 
limbs.  Every  one  is  bound  to  agree  that  such  a procedure  would 
be  fair  enough,  provided  the  yardstick  were  at  hand. 

Now  several  such  yardsticks  are  available,  and  have,  indeed, 
been  used  at  different  times  in  the  history  of  the  subject.  The 
one  which  has  at  the  present  time  come  to  be  almost  universally 
used,  because  of  its  significance  in  the  higher  mathematical  de- 
velopment of  the  subject,  is 

C = y/n  p q 

This  quantity,  which  is  perceived  to  be  easily  calculated,  and 
which  for  the  present  we  shall  call  simply  by  its  symbol  sigma , will 
be  more  fully  discussed  in  a later  chapter,  and  its  mechanical  and 
geometric  meaning  explained.  Here  it  need  only  be  pointed  out 
that  every  point  on  the  abscissal  axis  can  be  numerically  defined  as 
some  multiple  of  <x  since  it  itself  is  a distance  along  that  axis. 

So  then  we  may  set  up  Table  46  in  another  form,  as  shown  in 
Table  47  on  page  310. 

Normally,  of  course,  one  would  never  carry  so  many  places  of 
decimals  in  x/a-  But  this  example  will  indicate  that  the  position 


3io 


MEDICAL  BIOMETRY  AND  STATISTICS 


TABLE  47 

Abscissae  in  Terms  of  x/ c,  and  Ordinates  of  (4  + §)10 


QQ  j ^ y 

- 3.162278 000977 

- 2.529822 009766 

- 1.897366 043945 

- 1.264911 117187 

- .632456 205078 

0 246094 

+ .632456 205078 

4-  1.264911 117187 

+ 1.897366 043945 

4-  2.529822 009766 

+ 3.162278 000977 


of  any  abscissal  point  can  be  expressed  in  terms  of  a with  any  de- 
sired degree  of  accuracy. 

THE  NORMAL  CURVE 

It  has  been  pointed  out  that  to  get  the  probability  that  an 
event  will  occur  t or  more  times  in  n trials  it  is  necessary  only  to 
sum  the  terms  of  the  binomial  up  to  the  one  in  pt.qn~t.  This  is 
a simple  enough  matter  when  n is  small  or,  at  any  rate,  not  very 
large.  But  how  if  one  is  confronted  with  this  problem?  Suppose 
a city  to  have  10,000  births  per  annum,  and  further  suppose  that 
long  experience  of  that  city  has  demonstrated,  on  the  average, 
that  the  probability  of  any  given  birth  being  of  a male  is  p = .52. 
What  is  the  probability  that  in  a given  year,  say  next  year,  there 
will  be  born  5300  or  more  male  babies?  To  answer  this  by  the 
point  binomial  route  requires  the  calculation  and  summing  of  the 
successive  terms  in  the  binomial  (.52  + .48) 10,000  from  the  end 
of  the  curve  to  the  term  in  which  p has  the  exponent  5300.  Plainly 
the  labor  involved  in  this  procedure  would  far  outweigh  any  possible 
significance  which  could  attach  to  the  result. 

Let  us  examine  what  happens  as  the  exponent  n of  the  binomial 
increases  in  value.  Figure  75  shows  this  graphically  for  a small 
range  of  values  of  n , but  a sufficient  number  to  bring  out  the  point. 
In  plotting  this  diagram  all  the  deviations  are  taken  in  the  form 
4:(x/(y),  and  the  sums  of  the  ordinates  of  all  the  polygons  are  made 
the  same. 

Now  what  this  diagram  shows  is  that,  as  n increases,  the  polygon 


140 


Fig.  75. — Point  binomials  for  several  values  of  n,  and  a superimposed  normal  curve. 
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got  by  connecting  the  tops  of  the  ordinates  of  the  binomial  in- 
creases its  number  of  sides  as  would  be  expected.  Furthermore, 
the  binomial  approaches  in  its  form  closer  and  closer  to  the  smooth 
curve  as  n increases.  Now  suppose  n to  increase  indefinitely  in 
value.  The  resulting  polygon  would  come  closer  and  closer  to  the 
smooth  curve,  but  would  never  quite  reach  it  because,  after  all, 
however  large  n might  be,  if  it  were  still  finite,  the  resulting  figure 
would  still  be  a polygon,  that  is,  made  up  of  many  short  but  still 
straight  sides,  whereas  the  curve  is  everywhere  curving. 

But  suppose  we  went  on  to  the  binomial 

(h  + T 00 


Then  each  side  of  the  “polygon”  would  be  infinitely  short, 
corresponding  to  a point  in  a smooth  curve,  and  each  such  point 
may  be  thought  of  as  a straight  line  of  infinite  shortness.  Further- 
more, each  ordinate  of  this  “polygon”  would  be  infinitely  close  to 
the  next  one.  This  “polygon”  would  then  have  come  to  coincide 
exactly  with  the  smooth  curve,  and,  in  short,  have  become  identical 
with  it. 

In  other  words,  the  smooth  curve  is  what  is  known  mathematic- 
ally as  the  limit  of  the  point  binomial,  as  n of  the  binomial  increases. 
But  this  result  opens  out  wonderful  possibilities.  For,  plainly,  if 
the  equation  to  the  smooth  curve  is  known  it  can  be  integrated  over 
any  portion  of  its  range.  These  integrations  may  be  performed 
once  and  for  all,  for  this  curve  reduced  to  standard  area  of  say  1, 
and  tabled.  Then,  in  so  far  as  the  curve  is  a good  approximation  to 
the  binomial,  these  integrations  can  be  used  in  place  of  the  tedious 
finite  summation  of  the  terms  of  the  binomial,  and  the  derived 
probabilities  read  off  from  the  table  of  these  integrations,  without 
any  work  at  all.  Now  it  is  apparent  from  Fig.  75  that  with  n no 
larger  than  50  the  smooth  curve  is  a quite  sufficiently  close  approxi- 
mation to  the  binomial  for  all  practical  statistical  purposes,  and 
we  shall  be  quite  justified  in  so  using  it  in  practical  work. 

All  this  has  been  done.  The  integrals  of  the  smooth  curve, 
which  has  the  equation 


y = 


n 


x 2 

2 0-2 


y/2  7r  a 
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have  been  calculated  and  tabled.  Such  tables  are  known  as  tables 
of  the  probability  integral.  A short  table  of  this  kind,  but  quite 
extensive  enough  for  most  practical  statistical  work,  is  given  in 
Appendix  IV  of  this  book.  It  carries  the  argument — the  deviation 
from  the  center  measured  on  the  x/a  yardstick — to  two  places  of 
figures,  and  the  function  to  four  places.  Besides  the  area  the 
individual  ordinate  corresponding  to  the  same  argument  is  given 
in  each  case. 

The  curve  itself  is  known  as  the  normal  curve,  or  from  its  dis- 
coverers, the  De  Moivre-Gauss-Laplace  curve  of  error.  It  has  many 
and  varied  properties  and  uses  in  statistics,  space  for  the  discussion 
of  which  is  lacking  in  this  book.  It  may  truly  be  said  to  be 
the  very  corner-stone  of  the  foundation  of  the  statistical  treat- 


ment of  observational  data,  whether  quantitative  or  qualitative 
in  character. 

As  an  example  of  the  use  of  the  probability  integral  to  replace 
finite  summation  of  the  terms  of  the  point  binomial  we  may  take 
the  case  propounded  above  regarding  the  sex  ratio  of  births. 

Here 

n — 10,000,  p — .52,  q = .48 

Hence 


o — y/ n p q = -y/ 10,000  X .52  X .48  = 49.96 
x = 5300  - 5200  = 100 


100 

49.96 


2.00 


Thus  we  have  the  situation  depicted  graphically  in  Fig.  76. 
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Now  in  the  table  in  Appendix  IV  there  is  found  against  the 
argument  2.00  the  figure  .4772.  This  means  that,  taking  the  total 
area  of  the  curve  as  1 , the  area  of  that  part  of  the  curve  (A)  between 
the  mid-ordinate  (X/A  = 0)  and  the  ordinate  where  x/a  = 2,  is 
.4772.  Therefore  the  fraction  of  the  area  of  the  whole  curve  up  to 
the  ordinate  where  x/a  = 2 will  be  B + A = .5  + .4772  = .9772. 
Hence  the  area  of  the  rest  of  the  curve , which  measures  the  probabil- 
ity of  positive  deviations  of  2 xf  a and  greater , will  be  1 — .9772  = 
.0228.  Or  we  say  that  the  chances  are  about  2\  in  a hundred  that 
in  any  given  year  in  our  hypothetic  city  there  will  be  5300  or 
more  male  babies  born.  Or,  put  in  another  way,  we  should  not 
expect,  on  the  premises  stated  in  the  example,  5300  male  births 
in  a year  to  be  equalled  or  exceeded  oftener  than  between  two  or 
three  times  in  a century. 

THE  RELATION  BETWEEN  <J  AND  THE  PROBABLE  ERROR 

We  have  used  in  the  discussion  in  this  chapter  a as  the  yardstick 
to  measure  deviations.  In  an  earlier  chapter  the  probable  error 
has  been  used  for  the  same  purpose,  though  that  phase  of  the 
matter  was  not  then  emphasized.  What  is  the  relation  between 
the  two?  It  is  a simple  one,  that  given  by  the  following  equation: 

P.  E.  = .6744898  ...a. 
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CHAPTER  XII 


SOME  SPECIAL  THEOREMS  IN  PROBABILITY 

In  this  chapter  will  be  discussed  some  special  developments 
and  applications  of  the  theory  of  probability  likely  to  be  of  par- 
ticular use  to  the  medical  worker. 


THE  CHI-SQUARE  TEST 


A.  The  Goodness  of  Fit  of  Theory  to  Observation 

In  1900  Professor  Karl  Pearson  published  a paper1  which  opened 
with  the  following  sentence  in  italics:  uThe  subject  of  this  paper  is 
to  investigate  a criterion  of  the  probability  on  any  theory  of  an  observed 
system  of  errors , and  to  apply  it  to  the  determination  of  goodness  of 
fit  in  the  case  of  frequency  curves .”  This  paper  has  now  become  a 
classic,  and  from  it,  and  later  papers  elaborating  and  extending  the 
theory  which  it  embodies,  have  come  some  of  the  most  important 
developments  in  modern  statistical  work. 

Pearson  showed  that  the  probability  sought  in  his  opening  sen- 
tence is  given  by  the  expression 


n—  1 

X dx 
X dx 


The  expression  y2  = constant  is  the  equation  of  a generalized 
“ellipsoid,”  all  over  the  surface  of  which  the  frequency  of  the 
system  of  errors  or  deviations  is  constant. 

Tables  showing  the  value  of  P for  different  values  of  x2  are 
now  available  in  Pearson’s  “Tables  for  Statisticians  and  Bio- 
metricians.” The  consequence  is  that  the  application  of  the  chi- 
square  test  of  goodness  of  fit  is  an  extremely  simple  matter. 

Translating  the  matter  abruptly  from  abstract  mathematical 
notations  to  concrete  statistical  data,  the  value  of  x2>  for  a system 
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of  observed  and  theoretical  frequencies,  is  the  sum  of  the  ratios  of 
the  squared  differences  between  theoretical  and  observed  fre- 
quencies, to  the  theoretical  frequencies,  for  all  such  pairs  of  ob- 
served and  theoretical  frequencies. 

An  example  will  make  the  meaning  clear.  Table  48  gives  the 
observed  frequencies  (mV)  of  the  weight  of  the  brain  in  416  adult 
Swedish  males,*  and  the  theoretical  frequencies  (mr)  given  by  the 
normal  curve, 

y = 78.0401  e — .01 106  oc2 

fitted  to  the  observed  frequencies.  These  theoretical  frequencies 
in  Table  48  are  areas  of  the  normal  curve  standing  over  the  abscissal 
intervals  indicated  in  the  first  column. 

TABLE  48 

Observed  and  Theoretical  Frequencies  of  Swedish  Male  Brain  Weight,  to 

Show  the  Method  of  Calculating  x2 


Grams  of 
brain  weight. 

Observed 

(m'r) 

Calculated 

(mr) 

( mr  — m'rY 

mr 

Under  1100 

0 

.981 

.981 

1100-1149 

1 

2.9 

1.24 

1150-1199 

10 

8.5 

.26 

1200-1249 

21 

20.3 

.02 

1250-1299 

44 

39.0 

.64 

1300-1349 

53 

60.4 

.91 

1350-1399 

86 

IS. 2 

1.55 

1400-1449 

72 

75.3 

.14 

1450-1499 

60 

60.8 

.01 

1500-1549 

28 

39.4 

3.29 

1550-1599 

25 

20.6 

.94 

1600-1649 

12 

8.7 

1.25 

1650-1699 

3 

2.9 

.003 

1700-1749 

1 

.8 

.05 

1750  and  over 

0 

.036 

.036 

Totals 

416 

415.817 

11.320  = x2 

The  number  of  frequency  groups  here  is  15  ,f  and  x2  (the  sum 
of  the  quantities  in  the  last  column  of  the  table)  = 11.32.  From 

* From  p.  41  of  Pearl,  R.:  Biometrical  Studies  on  Man.  I.  Variation  and  Corre- 
lation in  Brain-weight,  Biometrika,  vol.  4,  pp.  13-104,  1905. 

f This  is  plainly  arbitrary  since  the  comparison  could  be  set  up  with  more  or 
fewer  classes  (as,  for  example,  by  lumping  the  tail  classes  at  each  end) . In  cases  where 
the  tail  classes  are  found  to  contribute  heavily  to  the  value  of  x2  they  should  be  so 
lumped  together. 
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Pearson’s  Tables  (p.  27),  with  n'  = 15  and  x2  = 11,  we  read  P = 
.686036.  For  n'  = 15  and  x2  = 12  we  read  P = .606303.  This 
result  means  that  if  the  brain  weight  of  Swedish  males  in  general 
varied  absolutely  and  strictly  in  accordance  with  the  normal  law 
of  error,  it  would  be  expected  that  in  about  65  out  of  every  100 
trials  of  the  matter,  each  trial  being  made  with  a random  sample  of 
416  individuals,  the  divergence  between  observation  and  theory 
would  be  greater  than  it  is  in  this  case.  Therefore,  it  may  be  con- 
cluded that  the  normal  curve  gives  an  excellent  fit  to  these  ob- 
servations. 

In  the  use  of  the  chi-square  test  of  the  goodness  of  fit  of  theory 
to  observation  the  following  points  must  always  be  kept  in  mind : 

1.  That  the  test  in  this  form  is  valid  only  for  frequencies,  not 
for  ratios,  rates,  or  time  ordinates. 

2.  That  the  theoretical  frequencies  must  be  areas  above  the 
abscissal  class  ranges,  and  not  mid-ordinates. 

3.  That  if  the  frequencies  are  very  small  and  scattering  toward 
the  tails  of  the  curve,  as  is  often  the  case,  a more  reliable  estimation 
of  P will  be  obtained  if  the  tail  frequencies  are  lumped  together  in 
two  single  classes,  one  at  each  tail  end  of  the  curve. 


THE  FOUR-FOLD  TABLE 

One  of  the  most  common  applications  of  the  X2  test  arises  in 
cases  where  we  have  knowledge  of  the  frequencies  of  each  of  the 
four  possible  combinations  of  two  attributes  in  respect  of  presence 
or  absence.  Such  knowledge  is  conveniently  presented  in  a table 
of  the  following  type: 

1st  Attribute 


+ 

- 1 

Totals 

+ 

a 

b 

(a  + b) 

— 

c 

d 

(c  -}-  d ) 

Totals 

( a + c ) 

( b + d ) 

(a  -\-  b c d)  — N 

In  this  table  a,  b,  c,  and  d are  the  frequencies  with  which  two 
attributes  have  been  observed  according  to  the  indicated  presence 
(+)  or  absence  ( — ) of  each. 
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The  question  to  be  answered  by  such  a table  is  as  to  whether  or 
not  there  is  any  significant  association  between  the  two  attributes. 
This  question  may  be  answered  by  considering  the  divergence  of 
the  observed  table  from  a theoretical  table  constructed  on  the  as- 
sumption that  the  two  attributes  are  completely  independent.  If 
the  attributes  are  completely  independent,  the  theoretical  fre- 
quency to  be  expected  in  the  upper  left  hand  corner  of  the  table 
(for  which  the  observed  frequency  is  a)  would  obviously  be 

-■g  ^ c • N,  in  accordance  with  the  theorem  regarding  the 

probability  of  concurrent  events,  set  forth  in  Chapter  XI,  supra , 
because  the  probability  of  the  presence  of  the  first  attribute  alone 

is  ^ ^ - ; the  probability  of  the  presence  of  the  second  attribute 
alone  is  and  the  probability  of  both  the  first  and  the  second 

attribute  being  present  together  will  be  -a  ^ ^ ^ b\  and  in  turn 

the  expected  number  of  such  combined  occurrences  will  be  this 
last  probability  multiplied  by  N.  Similar  expressions  can  be 
written  for  each  of  the  other  cells.  Thus  the  expression  for  the 
theoretical  expected  frequency  for  the  case  of  presence  of  the  first 
attribute  and  absence  of  the  second  (for  which  the  observed  fre- 
quency is  c ) will  be  ^ c--  • ^ ^ ^ • A- 

The  X2  test  of  the  preceding  section  can  therefore  be  applied 
to  determine  whether  or  not  there  is  any  significant  association 
between  the  two  attributes  under  “consideration.  The  only  addi- 
tional point  to  be  noted  is  that  since  the  theoretical  frequencies  are 
determined  from  the  marginal  totals  of  the  observed  frequencies, 
there  is  but  one  independent  comparison  to  be  made,  although 
four  cells  are  under  consideration.  The  fact  that  there  is  but  one 
degree  of  freedom  may  be  shown  by  considering  the  difference 
between  theoretical  and  observed  for  each  cell  in  turn,  when 
it  will  be  found  that  all  four  differences  are  the  same.  For 
example : 

{a  + 6)  {a  + c ) _ a2  + ab  + ac  + be  — a2  — ab  — ac  — ad 

~~N  ^ “ a ~ a + b + c + d 

be  — ad 


a -f-  b -f-  c T d’ 
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and 

( a + b)  (b  + d)  ab  + b2  + ad  -j-  bd  — ab  — b2  — be  — bd 

• •TV  — 1 0 — 

A A a+6+c+d 

ad  — be 
cl  b c d 

These  two  differences  are  alike  except  for  sign  and  it  will  be 
found  on  trial  that  the  other  two  differences  are  arithmetically  equal 

ad  — be 
a -\-  b -\-  c -\-  d 

So  then  we  shall  have  the  following  theoretical  four-fold  table 
for  the  assumed  case  of  complete  independence  of  the  two  attributes: 


1st  Attribute 


+ 

— 

Totals 

+ 

A 

B 

{A  ~\~  B)  = (a  + b) 

— 

C 

D 

(C  + D)  = (c  + d) 

Totals  | 

(A  + C)  — (a  + c) 

(B  + D)  - (b  + d) 

A 

Here  the  capital  letters  indicate  the  theoretical  frequencies  and 
the  small  letters  the  observed  frequencies  as  before. 

Then,  in  accordance  with  the  preceding  section, 


(A  - a)2  , (B  - b)2  , (C  - c )2  , (D  - d)2 
A 'B  C + D ’ 


In  making  a comparison  between  two  or  more  series  of  observa- 
tions the  number  of  frequency  classes  involved  in  the  comparison 
must  be  taken  into  consideration.  R.  A.  Fisher  has  shown  that 

n = (r  - 1)  (c  — 1), 

where  n is  the  number  of  “degrees  of  freedom”  in  making  such  com- 
parisons, r is  the  number  of  rows,  and  c the  number  of  columns  in 
the  table. 

In  Pearson’s  Tables  (pp.  26-28)  the  tabular  argument  is  in  terms 
of  nf  = ir  — 1)  {c  — 1)  -f-  1. 

Thus  we  need  a table  to  show  the  values  of  P associated  with  X2 
values  when  we  have  but  one  degree  of  freedom,  that  is  when  nf 
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of  the  notation  of  the  previous  section  = 2 (that  is,  (r  — 1)  (c  — 1) 
+ 1 =2).  But  since,  as  is  shown  later,  X2  = (~\  2 Table  B of 

Appendix  III  may  be  used.  It  will  only  be  necessary  to  take  the 
square  root  of  the  observed  X2  and  enter  Table  B with  that  value 
as  argument. 

As  an  illustration  of  the  application  of  X2  to  a four-fold  table, 
consider  the  following  table*  which  shows  the  individuals  under 
consideration,  tested  as  to,  first,  whether  or  not  they  had  enlarged 
spleens,  and,  second,  as  to  whether  or  not  their  blood  films  showed 
presence  of  malaria  parasites. 


Enlarged  Spleen 


+ 

— 

Totals 

+ 

740 

743 

1483 

— 

1287 

2731 

4018 

Totals 

2027 

3474 

5501 

We  may  compute  x2  directly  by  inserting  in  each  cell  the  theo- 
retical frequency,  and  then  using  the  formula  given  above.  Thus 
we  have 


Enlarged  Spleen 


+ 

— 

Total 

+ 

546.45 

936.55 

1483 

740 

743 

— 

1480.55 

1287 

2537.45 

2731 

4018 

Total 

2027 

3474 

5501 

(Theoretical  frequencies  in  italics.) 


X 


2 


(546.5  - 740)2  (936.5  - 743)2  (1480.5  - 1287)2  (2537.5  - 2731)2 

546.5  + 936.5  + 1480.5  + 2537.5 


= 148.6 


* Data  taken  from  paper  by  H.  C.  Clark,  M.  D.,  entitled  “A  Comparison  of  the 
Spleen  and  Parasite  Rates  as  Measure  of  Malaria  Incidence  in  the  Races  of  the  Main- 
land of  Central  America.”  Seventeenth  Annual  Report  of  Medical  Department  of 
United  Fruit  Co.,  1928. 
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If  we  take  the  algebraic  form  of  this  four-fold  table,  and  operate 
with  it  as  with  the  table  above,  we  may  arrange  x2  in  this  form 

2 __  (ad  — be)2  N 

(a  + b)  (c  + d)  (a  + c)  (b  -(-  d) 

For  the  above  example  this  gives 

2 _ [(740  X 2731)  - (743  X 1287)]2  5501 
X 1483  X 4018  X 2027  X 3474 

(1064699)2  5501  623585  X 1010 

~ (5958694)  (7041798)  “ 419599  (10)8 

= 148.6, 

which  is  the  same  as  the  previous  result. 

If  we  look  up  in  Table  B of  Appendix  III  in  which  n = 1,  P for 

X2  = 148  (x  = 12.1655)  we  find  that  this  value  of  x = ~a  is  outside 

the  range  of  the  table.  Thus  P must  be  less  than  0.00000000026. 
This  indicates  that  the  actually  observed  concurrent  frequency  of 
enlarged  spleen  and  parasites  in  the  blood  is  significantly  different 
from  the  theoretical  concurrent  frequency  when  there  is  no  asso- 
ciation between  the  two  attributes.  Hence  we  may  conclude  that 
these  two  attributes  are  definitely  and  significantly  associated. 

We  may  examine  this  association  from  still  another  point  of 
view  by  considering  the  two  groups  of  persons  having  positive  and 
negative  blood  films  and  forming  the  proportions  of  individuals  in 
each  group  that  have  enlarged  spleens. 

Thus 

pi  = per  cent,  of  enlarged  spleens  in  blood  + group  = 49.899 
p<i  = per  cent,  of  enlarged  spleens  in  blood  — group  = 32.031 

The  difference  between  these  two  percentages  is  17.868,  and 
this  difference  may  be  considered  in  terms  of  the  standard  error 
of  the  difference  in  order  to  determine  its  significance.  Since  it  is 
a question  as  to  whether  or  not  these  two  percentages  could  have 
arisen  from  the  same  universe,  we  may  use  the  p of  the  marginal 
total  for  the  determination  of  the  standard  error  of  pi  and  p2. 

Thus,  by  principles  explained  in  Chapter  XI, 

2027 

p = 100  • Hkl  = 36>848 


21 
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Then,  again,  by  the  theory  of  simple  probability  developed  in 
Chapter  XI, 

<jpi=  a/^6:.8!8!!6.3:152^  = -y/l. 56915  = 1.253 
\ 1483  V 

<yp2  = A / ^ 3 6- :-84_§I-f^3 • 1 5 2 ) = 57915  = .761 

°>i  - p2=  Vo2pi  + G2p2  = V2- 14830  = 1-4657 

Difference  _ pi  — p2  _ 17.868  _ ^ 

Standard  error  of  difference  . 1.4657 

up  i — p2 

Now  = Vl48.6  = 12.19,  and  the  fact  that  x2  = (“ ) 2 
can  be  shown  in  general. 

When  we  test  a four-fold  table  by  x2>  we  are,  therefore,  testing 
as  to  whether  or  not  there  is  any  significant  difference  between  any 
two  of  the  properly  contrasted  proportions  of  the  table. 

THE  CHI-SQUARE  COMPARISON  OF  TWO  OBSERVED  SAMPLES 

A third  application  of  the  x2  test  which  we  owe  to  Pearson2 
should  be  widely  useful  to  medical  men.  Problems  of  the  following 
sort  arise  constantly:  Given  two  frequency  distributions  of 
phenomena,  what  is  the  probability,  on  the  one  hand,  that  the  two 
can  be  regarded  as  random  samples  from  the  same  population, 
whose  characteristics  are  known  only  from  the  samples;  or,  put  the 
other  way  about,  what  is  the  probability  that  the  one  distribution  is 
really  different  from  the  other  to  a greater  degree  than  could  reason- 
ably be  supposed  to  have  arisen  by  the  operation  of  chance  alone? 

Pearson  shows  that  if  we  let  the  population  from  which  the  two 
samples,  if  undifferentiated,  are  supposed  to  be  drawn  be  given  by 
the  class  frequencies 

wi,  m<i,  m3,  m\ mp,  mq ms 

the  total  population  being  M,  and  let  the  samples  be  given  by  the 
frequencies  in  the  same  classes: 

Total 


First  sample  

A 

u 

u 

fp 

fq 

■ ■ • fs 

Second  samole 

/'  i 

f\ 

ft 

. . . 

f'p 

f'q 

f's 
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where  the  totals  N and  N'  differ  widely  or  little,  and  then  form  a 
quantity 

\ 

> 

J 

where  S\  denotes  summation  of  like  quantities  from  1 to  s,  that 
then  the  required  probability  that  the  two  samples  are  undifferen- 
tiated, i.  e.,  did  come  as  random  samples  from  the  same  population, 
may  be  found  by  looking  out  the  value  of  P corresponding  to  the 
ascertained  x2  and  n'  (the  number  of  classes)  from  the  tables  given 
on  pp.  26-29  of  Pearson’s  “Tables  for  Statisticians  and  Biometri- 
cians.” 

Let  an  example  make  the  theorem  plain.  MacDonald*  gave 
the  following  distributions  of  hair  color  of  children  attacked  (a) 
with  scarlet  fever  and  (b)  with  measles,  from  data  collected  in  the 
Glasgow  Corporation  Fever  Hospitals. 

The  question  is:  Do  scarlet  fever  and  measles  attack  indi- 
viduals indifferently  and  at  random  so  far  as  concerns  hair  pigmen- 
tation? Or,  in  other  words,  are  the  scarlet  fever  and  measles 
distributions,  in  respect  of  hair  color,  different  from  each  other 
only  by  so  much  as  might  arise  by  chance  in  samples  of  the  size  of 
these? 

TABLE  49 

Data  on  the  Incidence  of  Scarlet  Fever  and  Measles  in  Relation  to  Hair 

Pigmentation 

(MacDonald’s  Data) 


Hair  color. 

Number  of  cases  of 

Scarlet  fever. 

Measles. 

Black 

12 

0 

Dark 

289 

85 

Medium 

1109 

367 

Fair * 

360 

184 

Red 

94 

25 

Totals 

1864 

661 

* MacDonald,  David:  Pigmentation  of  the  Hair  and  Eyes  of  Children  Suffering 
from  the  Acute  Fevers;  Its  Effect  on  Susceptibility,  Recuperative  Power,  and  Race 
Selection,  Biometrika,  vol.  8,  pp.  13-39,  1911. 
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The  distributions  are  shown  graphically  in  Fig.  77.  The 
numerical  work  is  set  forth  in  Table  50. 


Fig.  77.-— Distribution  of  scarlet  fever  and  measles  in  respect  of  hair  color  of  those 

attacked. 


From  Table  50. 

X2  = NN'  X .000,0211  = 1864  X 661  X .000,0211  = 26.00 

P from  the  tables  is  about  .000,03.  In  other  words,  the  odds 
are  more  than  33,000  to  1 against  the  occurrence  of  two  such  diver- 
gent samples  of  hair  color  if  they  were  random  samples  from  the 
same  population.  We  can  conclude  that  they  are  really  differen- 
tiated samples,  or  that  scarlet  fever  and  measles  do  not  attack 
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indifferently  all  individuals  whatever  their  hair  pigmentation;  or, 
that  scarlet  fever  and  measles  are  differential  in  their  selection. 

It  will  be  seen  that  the  arithmetical  work  is  not  difficult,  and 
the  usefulness  of  the  method  in  drawing  correct  conclusions  from 
many  classes  of  medical  data  is  great.  One  caution  must  always 
be  kept  in  mind.  The  validity  of  the  method  depends  upon  the 
data  tested  being  frequencies.  It  is  not  directly  applicable  to 
rates,  indices,  or  true  ordinates. 

PRACTICAL  PROBLEMS  OF  SAMPLING 

In  the  practical  affairs  of  life  perhaps  the  most  frequent  use  of 
the  statistical  method  which  is  made,  either  consciously  or  uncon- 
sciously, is  to  form  a judgment  of  the  probable  constitution  of  an 
unknown  universe,  on  the  basis  of  the  constitution  of  a sample  of 
known  constitution  drawn  at  random  from  it. 

For  example,  suppose  it  to  be  assumed  that,  in  order  to  justify 
mass  treatment  for  hookworm  infestation  in  a population,  70  to  80 
per  cent,  of  the  people  must  harbor  the  worms.  How,  by  a process  of 
sampling  in  making  examinations,  shall  it  be  ascertained  that  this 
proportion  of  the  people  does,  in  fact,  probably  harbor  the  worms? 

This  is  not  an  easy  or  simple  problem.  Much  research  still 
needs  to  be  done  on  the  general  problem  of  which  the  one  cited  is  a 
particular  case,  before  we  shall  be  able  to  proceed  with  entire 
precision,  and  certainty  of  the  validity  of  all  the  methods  employed 
in  its  solution.  But  in  the  meantime  the  problem  is  of  such  great 
practical  importance  to  every  scientific  worker  that  it  seems 
desirable  to  discuss  it  in  some  detail  here. 

In  the  first  place  it  can  be  seen  at  once  that  an  adequate  judg- 
ment of  the  constitution  can  only  be  arrived  at  if: 

(a)  The  sample  is  a good  one. 

(b)  The  sample  is  an  adequate  one. 

By  a “good”  sample  is  meant  one  which  is  fairly  representative 
qualitatively  of  the  universe  from  which  it  is  drawn.  By  an 
“adequate”  sample  is  meant  one  which  is  large  enough  in  point  of 
numbers  to  satisfy  the  requirements  of  the  theory  of  probability. 

To  get  a good  sample,  if  we  are  working  in  the  realm  of  living 
things,  is  a biologic  problem  primarily  and  fundamentally.  How 
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shall  it  be  gone  about?  Evidently  the  general  criterion  is  that  the 
sample  should  contain  at  least  one  individual  from  each  of  the 
classes  of  the  universe  known  from  prior  experience  to  be  differen- 
tiated in  any  important  particular  from  all  other  classes  in  the 
universe.  Thus,  to  consider  the  hookworm  case.  We  know, 
quite  apart  from  hookworm  problems  at  all,  that  mankind  is 
differentiated  everywhere  into  classes  in  respect  of 

(a)  Age. 

( b ) Sex. 

(c)  Race  (or  color). 

(d)  Geographic  location. 

That  is  to  say,  at  any  given  instant  of  time  it  is  known  that  a 
human  population  contains  a number  of  people  forming  a class 
ranging  in  age  from  birth  to  nine  years,  another  class  aged  ten  to 
nineteen  years,  etc.  It  contains  a class  of  persons  like  each  other, 
but  different  from  all  the  rest,  in  respect  of  being  males.  It  con- 
tains perhaps  a class  of  persons  who  are  white,  and  another  class 
who  are  colored.  It  contains  a class  of  persons  who  all  live  in 
town  A,  another  class  of  persons  who  live  on  farms  in  county  B,  etc. 

These  are  all  perfectly  well-known  and  certain  differentiations 
of  the  population.  Whatever  else  may  be  peculiarly  distributed 
among  the  individuals  of  our  universe,  it  is  certain  that  any  universe 
of  human  beings  from  which  it  is  proposed  to  draw  a sample  will 
contain  some  or  all  of  these  four  differentiations  which  have  been 
mentioned.  Plainly,  then,  any  sample,  to  be  qualitatively  repre- 
sentative of  the  universe,  must  contain  some  individuals  from  each 
of  the  differentiated  classes.  Thus,  to  have  a representative  sample 
from  the  population  of  a given  locality  relative  to  our  hookworm 
problem,  it  would  be  necessary  to  take  as  a minimum  one  person 
in  each  decade  of  age,  or  say  10  in  all.  But  there  should  also  within 
each  decade  be  at  least  one  male  and  one  female,  and  one  white 
and  one  colored  person,  making  4 X 10  or  40  in  all.  Of  course 
practically  there  may  be  no  negroes  at  all  in  the  locality,  or  there  may 
be  no  persons  ninety  to  ninety-nine  years  of  age,  and  so  on,  in  any  of 
which  events  the  necessary  sample  will  be,  by  so  much,  reduced. 

As  regards  geographic  location  the  procedure  must  be  in 
principle  the  same.  The  whole  universe  dealt  with  covers  a certain 
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area.  To  get  a representative  sample  it  will  therefore  be  necessary 
to  lay  down  over  the  whole  area  an  imaginary  network,  in  which 
all  the  meshes  are  of  equal  and  not  too  large  area,  and  then  draw  a 
sample  relative  to  the  other  differentiations  from  within  each  mesh. 

The  meaning  of  all  this  discussion  is  that  it  is  both  practically 
and  theoretically  wise  to  make  all  probabilities  specific  relative  to 
already  known  differentiations  of  the  universe  from  which  the  sam- 
ple is  drawn.  Crude  probabilities  for  whole  universes  in  which  differ- 
entiation is  known  to  exist,  rarely  have  any  particular  practical 
significance.  Thus  one  might  ask  what  is  the  probability  that  a 
warm-blooded  animal  will  shave  tomorrow  morning,  and  put  into 
the  denominator  of  the  fraction  all  the  elephants,  tigers,  other  mam- 
mals, and  birds;  but  supposing  there  were  accurate  data  to  do  all  this, 
the  resulting  probability  would  have  only  a very  academic  interest, 
because  everyone  already  knows  beforehand  from  direct  observa- 
tional experience  that  elephants  and  eagles,  for  example,  do  not  shave. 

This  reasoning  applies  to  the  hookworm  problem  in  this  way. 
In  a county  the  situation  actually  may  be  this:  On  four  or  five 
plantations  in  one  corner  of  the  county  90  per  cent,  of  the  negro 
laborers  are  infested.  Nowhere  else  in  the  county  nor  among  the 
whites  is  there  more  than  1 per  cent,  of  infestation.  This  is  the 
real  situation,  but  is  unknown  to  the  workers  who  come  into  that 
county  to  clean  up  hookworm  by  an  efficient  campaign.  By  what 
general  procedure  shall  the  real  fact  become  most  speedily  known? 
Now,  plainly,  a completely  random  sample  of  the  county  taken  as  a 
whole,  and  the  probability  deduced  therefrom  would  be  quite 
misleading,  and  of  no  practical  use  in  bringing  about  the  prompt 
treatment  of  the  negroes  on  the  heavily  infested  plantations. 
But  suppose  the  imaginary  network  to  have  been  laid  down  and 
each  mesh  sampled,  with  due  regard  to  the  other  differentiations 
of  color,  age,  and  sex.  Then  it  would  at  once  appear  that  virtually 
all  the  efforts  should  be  directed  to  one  mesh.  Furthermore,  if 
the  individuals  to  form  the  sample  in  each  mesh  were  chosen 
relative  to  the  other  differentials,  color,  sex,  and  age,  so  that  the 
sample  should  contain  the  two  races,  the  two  sexes,  and  the  different 
ages,  in  roughly  the  proportion  that  they  existed  in  the  population 
of  the  mesh , then  it  would  at  once  appear  that  it  was  the  negroes 

A 

only  who  needed  mass  treatment. 
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We  now  come  to  the  question  of  how  many  individuals  should 
be  included  in  the  sample  taken  in  the  way  indicated  from  each 
mesh,  or,  in  short,  how  large  must  a sample  be  to  be  adequate? 
This  is  a mathematical  problem,  and,  as  will  appear  as  we  go  on,  a 
problem  to  which  no  fixed  or  unique  general  answer  can  be  given. 
What  size  of  sample  is  adequate  depends  in  part  upon  the  con- 
stitution of  the  population.  How  this  works  out  we  may  now 
consider. 

Suppose  a population  of  any  absolute  size  whatever,  say  N , 
except  for  the  restriction  that  it  shall  be  at  least  ten  times  as  large 
as  any  sample  m drawn  from  it. 

Further,  suppose  that  the  proportion  of  hookworm  infestation 
in  N is  actually  (though  unknown  to  us) : 

(a)  10  per  cent. 

( b ) 20  per  cent. 

(. c ) 30  per  cent. 

(d)  40  per  cent. 

(e)  50  per  cent. 

(f)  60  per  cent. 

(g)  70  per  cent. 

(h)  80  per  cent. 

(i)  90  per  cent. 

. Suppose  now  we  take  samples  from  N,  of  m individuals  in  each 
sample,  and  examine  certain  consequences  which  flow  from  different 
values  of  m. 

We  may  then  set  up  the  following  table  (Table  51),  which  shows 
in  each  cell  two  figures.  These  figures  are  the  lower  (light)  and 
upper  (heavy)  limiting  whole  numbers  of  individuals  who  will  be 
found  to  have  hookworm  infestation,  on  the  average,  in  only  one 
sample  of  the  size  named  out  of  every  200  such  samples  tried  of 
the  same  size,  if  the  general  population  from  which  the  sample  is 
drawn  is  actually  infested  in  the  degree  indicated  by  the  percentage 
figure  at  the  top  of  the  column.  That  is  to  say,  to  take  a concrete 
example,  if  90  per  cent,  of  the  population  are  really  infested,  in  a 
random  sample  of  100  from  that  population  there  will  not  be  found 
fewer  than  82  persons  showing  infestation  as  often  as  once  in  200 
trials.  Odds  of  199  to  1 are  sufficiently  wide  to  constitute  certainty 
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in  most  practical  statistical  matters.  These  odds  indicate  a far 
smaller  fluctuation  or  error  than  inheres  in  the  original  observational 
or  experimental  data  of  biology  generally. 

TABLE  51 
Sampling  Limits 


Actual  percentage  of  occurrence  in  population  N. 


m 

10 

%• 

20 

30 

%• 

40 

%• 

50 

%• 

60 

%• 

70 

%• 

80 

%• 

90 

cr 

/L  • 

10 

0 

4 

0 

6 

0 

7 

0 

8 

0 

10 

2 

10 

3 

10 

4 

10 

6 

10 

15 

0 

5 

0 

7 

0 

10 

1 

11 

2 

13 

4 

14 

5 

15 

8 

15 

10 

15 

20 

0 

6 

0 

9 

0 

12 

2 

14 

4 

16 

6 

18 

8 

20 

11 

20 

14 

20 

25 

0 

7 

0 

11 

1 

14 

3 

17 

6 

19 

8 

22 

11 

24 

14 

25 

18 

25 

30 

0 

8 

0 

12 

2 

16 

5 

19 

7 

23 

11 

25 

14 

28 

18 

30 

22 

30 

35 

0 

9 

0 

14 

3 

18 

6 

22 

9 

26 

13 

29 

17 

32 

21 

35 

26 

35 

40 

0 

9 

1 

15 

4 

20 

8 

24 

11 

29 

16 

32 

20 

36 

25 

39 

31 

40 

45 

0 

10 

2 

16 

5 

22 

9 

27 

13 

32 

18 

36 

23 

40 

29 

43 

35 

45 

50 

0 

11 

2 

18 

6 

24 

11 

29 

15 

35 

21 

39 

26 

44 

32 

48 

39 

50 

60 

0 

12 

4 

20 

8 

28 

14 

34 

20 

40 

26 

46 

32 

52 

40 

56 

48 

60 

70 

0 

14 

5 

23 

11 

31 

17 

39 

24 

46 

31 

53 

39 

59 

47 

65 

56 

70 

80 

1 

15 

6 

26 

13 

35 

20 

44 

28 

52 

36 

60 

45 

67 

54 

74 

65 

79 

90 

1 

17 

8 

28 

15 

39 

24 

48 

32 

58 

42 

66 

51 

75 

62 

82 

73 

89 

100 

2 

18 

9 

31 

18 

42 

27 

53 

37 

63 

47 

73 

58 

82 

69 

91 

82 

98 

110 

2 

20 

11 

33 

20 

46 

30 

58 

41 

69 

52 

80 

64 

90 

77 

99 

90 

108 

120 

3 

21 

12 

36 

23 

49 

34 

62 

45 

75 

58 

86 

71 

97 

84 

108 

99 

117 

130 

4 

22 

14 

38 

25 

53 

37 

67 

50 

80 

63 

93 

77 

105 

92 

116 

108 

126 

140 

4 

24 

15 

41 

28 

56 

41 

71 

54 

86 

69 

99 

84 

112 

99 

125 

116 

136 

150 

5 

25 

17 

43 

30 

60 

44 

76 

59 

91 

74 

106 

90 

120 

107 

133 

125 

145 

160 

6 

26 

18 

46 

33 

63 

48 

80 

63 

m 

80 

112 

97 

127 

114 

142 

134 

154 

170 

6 

28 

20 

48 

35 

67 

51 

85 

68 

102 

85 

119 

103 

135 

122 

150 

142 

164 

180 

7 

29 

22 

50 

38 

70 

55 

89 

72 

108 

91 

125 

110 

142 

130 

158 

151 

173 

190 

8 

30 

23 

53 

40 

74 

58 

94 

77 

113 

96 

132 

116 

150 

137 

167 

160 

182 

200 

9 

31 

25 

55 

43 

77 

62 

98 

81 

119 

102 

138 

123' 

157 

145 

175 

169 

191 

300 

16 

44 

42 

78 

69 

111 

98 

142 

127 

173 

158 

202 

189 

231 

222 

258 

256 

284 

400 

24 

56 

59 

101 

96 

144 

134 

186 

174 

226 

214 

266 

256 

304 

299 

341 

344 

376 

500 

32 

68 

76 

124 

123 

177 

171 

229 

221 

279 

271 

329 

323 

377 

376 

424 

432 

468 

600 

41 

79 

94 

146 

151 

209 

209 

271 

268 

332 

329 

391 

391 

449 

454 

506 

521 

559 

700 

49 

91 

112 

168 

178 

242 

246 

314 

315 

385 

386 

454 

458 

522 

532 

588 

609 

651 

800 

58 

102 

130 

190 

206 

274 

284 

353 

363 

437 

444 

516 

526 

594 

610 

e70 

698 

742 

900 

66 

114 

149 

211 

234 

306 

322 

398 

411 

489 

502 

578 

594 

666 

689 

751 

786 

834 

1000 

75 

125 

167 

233 

262 

338 

360 

440 

459 

541 

560 

640 

662 

738 

767 

833 

875 

925 

The  manner  in  which  Table  51  was  calculated  needs  some  dis- 
cussion. First,  for  each  value  of  m and  of  the  percentages  of  infes- 
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tation  the  sigma  (cr)  of  the  point  binomial  was  calculated.  Thus 
for  60  per  cent,  of  infestation  and  m = 100 

o = V100  x .6  x .4 

The  values  so  obtained  were  multiplied  by  2.58,  which  is  the 
x/(7  value  which  cuts  off  just  a little  more  than  .005  of  the  tail  area 
of  the  normal  curve.  The  value  so  obtained  was  then  subtracted 
from  the  mean  number  expected  on  each  set  of  m,  p,  and  q values, 
to  obtain  the  lower  (light)  entries  in  the  table,  and  added  to  it  to 
obtain  the  upper  (heavy)  entries.  The  tabled  values  were  adjusted 
to  whole  numbers  from  the  values  computed  to  three  places  of 
decimals  by  taking  for  each  light  entry  the  next  lower  whole  num- 
ber, and  for  each  heavy  entry  the  next  higher  whole  number,  regard- 
less of  the  value  of  the  decimal  portion.  This  was,  of  course,  to 
create  a margin  of  safety,  beyond  the  strictly  accurate  decimal 
values. 

There  may  be  some  inclined  to  object  to  the  procedure  outlined 
above,  on  the  ground  that  in  the  case  of  the  extremely  skew  bino- 
mials, say  where  p = .9  and  q = .1,  there  will  be  scant  justification 
for  replacing  the  areas  of  the  binomial  with  those  of  the  normal 
curve,  as  has  been  done  in  the  formation  of  Table  51.  Wishing 

to  see  just  how  much  there  was  in  this  objection,  and  also  desiring 

« 

TABLE  52 


Ordinates  of  Point  Binomial,  When  n = 10.  Sum  of  All  Ordinates  = 1.00 


Favorable 

P = -5 

p = .6 

P = -7 

p — .8 

p = .9 

occurrences. 

q = .5 

q = A 

a = .3 

q = .2 

q — A 

10 

.00 

01 

03 

.11 

.35 

9 

.01 

.04 

.12 

.27 

.39 

8 

04 

.12 

.23 

30 

.19 

7 

.12 

.21 

.27 

.20 

.06 

6 

.21 

.25 

20 

.09 

.01 

5 

.25 

.20 

10 

.03 

00 

4 

.21 

.11 

.04 

01 

.00 

3 

.12 

.04 

.01 

00 

00 

2 

.04 

.01 

00 

.00 

00 

1 

.01 

.00 

.00 

.00 

00 

0 

.00 

.00 

.00 

.00 

00 

Sum 

1.01 

.99 

1 .00 

1.01 

1.00 
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TABLE  53 

Ordinates  of  the  Point  Binomial  When  n = 50.  Sum  of  All  Ordinates  = 

1.000000 


Favorable 

occurrences. 

P = .5 

q = .5 

II  II 

P = -7 
q = .3 

p — .8 

q = ■ 2 

p = .9 

Q = -1 

50.. . 

. 000000 

.000000 

.000000 

.000014 

.005154 

49 

.000000 

.000000 

.000000 

.000178 

.028632 

48 

.000000 

.000000 

.000004 

.001093 

.077943 

47 

.000000 

.000000 

.000028 

.004371 

. 138565 

46 

.000000 

.000000 

.000140 

.012840 

. 180904 

45 

.000000 

.000002 

.000551 

.029531 

. 184925 

44 

.000000 

.000011 

.001771 

.055371 

.154104 

43 

.000000 

.000047 

.004770 

.087012 

.107628 

42 

.000000 

.000169 

.010989 

.116922 

.064278 

41. 

.000002 

.000527 

.021978 

. 136409 

.033329 

40 

.000009 

.001440 

.038619 

.139819 

.015183 

39 

.000033 

.003491 

.060185 

.127108 

.006135 

38 

.000108 

.007563 

.083830 

.103275 

.002215 

37 

.000315 

.014738 

.105017 

.075470 

.000719 

36 

.000833 

.025967 

.118948 

.049864 

.000211 

35 

.001999 

.041547 

.122347 

.029919 

.000056 

34 

.004373 

.060589 

.114700 

.016362 

.000014 

33 

.008746 

.080785 

.098314 

.008181 

.000003 

32 

.016035 

.098737 

.077247 

.003750 

.000001 

31 

.027006 

.110863 

.055757 

.001579 

.000000 

30 

.041859 

.114559 

.037039 

.000612 

29 

.059799 

.109103 

.022677 

.000218 

28 

.078826 

.095879 

.012811 

.000072 

27 

.095962 

.077815 

.006684 

.000022 

26 

.107957 

.058361 

.003223 

.000006 

25 

.112275 

.040464 

.001436 

.000001 

24 

.107957 

.025938 

.000592 

.000000 

23 

etc. 

.015371 

.000225 

22 

symmetrical 

.008417 

.000079 

21 

to  first 

.004257 

.000026 

20 

half. 

.001987 

.000008 

19 

.000854 

.000002 

18. . . . 

.000338 

.000001 

17 

.000123 

16 

.000041 

15. . . 

.000012 

14 

.000003 

13. . . . 

.000001 

12 

.000000 

Sum 

.9999997* 

.999999 

.999998 

.999999 

.999999 

* Of  all  51  terms. 


to  give  the  reader  of  this  book  a concrete  idea  of  the  behavior  of 
binomials  with  different  values  of  p and  q,  I asked  my  assistant, 
Dr.  Flora  Sutton,  to  calculate  the  ordinates  of  a series  of  binomials. 
The  results  are  given  in  Tables  52  and  53. 
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SOME  SPECIAL  THEOREMS  IN  PROBABILITY 

Consider  the  most  unfavorable  case  in  Table  51  where  n — 10, 
and  the  percentage  of  occurrence  is  90.  The  table  says,  on  the 
basis  of  normal  curve  areas,  that  if  90  is  the  true  unknown  percent- 
age, we  shall  not  get,  with  samples  of  10,  fewer  than  6 favorable 
occurrences.  Summing  the  ordinates  of  the  binomial  in  the  last 

5 

column  of  Table  52,  we  have  2 = 0.00.  To  more  than  the  degree 

o 

of  refinement  that  anyone  ought  to  work  with  on  the  basis  of 
samples  of  10,  the  normal  curve  area  adequately  approximates  the 
sum  of  the  terms  of  the  binomial,  in  the  case  which  is  of  all  in  Table 
51  most  unfavorable  to  the  normal  curve. 

We  might  let  the  case  rest  here,  but  it  seems  desirable  to  present 
another  table  for  the  binomial  having  n = 50.  This  is  done  in 
Table  53. 

Again,  let  us  test  the  worst  case.  Table  51  states  that  if  the 
true  but  unknown  composition  of  the  population  is  90  per  cent, 
events  of  the  favorable  sort  one  will  not  expect  to  get  in  samples  of 
50  fewer  than  39  favorable  cases,  oftener  than  five  times  in  a 
thousand.  From  the  last  column  of  Table  53  the  sum  of  the  terms 
of  the  binomial  up  to  39  is  .003220,  or  about  3 cases  in  1000  trials. 
Up  to  40  the  sum  is  .009355  or  9 cases  in  1000  roughly.  For  all 
practical  statistical  purposes  it  is  apparent  that  Table  51  is  a safe 
guide. 

The  practical  uses  of  Table  51  are  obviously  manifold.  It 
enables  one,  either  from  direct  reading  or  interpolation  between 
tabled  values,  to  answer  many  questions  which  arise  in  experimental 
work,  in  field  work,  in  epidemiologic  enquiries,  and,  indeed, 
wherever  in  the  whole  range  of  scientific  investigation  a problem  of 
sampling  confronts  one. 
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CHAPTER  XIII 


THE  MEASUREMENT  OF  VARIATION 


THE  FREQUENCY  DISTRIBUTION 


When  one  measures  with  a sufficient  degree  of  precision  a 
number  of  occurrences  of  any  natural  event  whatever,  he  encounters 
the  phenomenon  of  variation.  No  two  occurrences  are  exactly 
alike,  whether  we  are  concerned  with  a physiologic  event,  such  as 
pulse-rate  or  body  temperature,  or  a morphologic  matter,  such  as 
brain  weight  or  cephalic  index,  or  what  not.  If  one  measures 
exactly  many  events  of  the  same  kind  and  arranges  the  results  in 
progressive  order  he  will  form  a frequency  distribution  of  variation 
(cf.  Chapters  IV  and  VI  supra).  An  example  of  such  a distribution 
is  given  in  Table  54  and  is  exhibited  graphically  as  a histogram  in 
Fig.  78. 

TABLE  54 


Frequency  Distribution  oe  Variation  in  Pulse  Beats  Per  Minute  in  English 

Convicts* 

Pulse  beats  per  minute.  Frequency  of  occurrence. 


44.5-  48.4 2 

48.5-  52.4 5 

52.5-  56.4 17 

- 56.5-  60.4 57 

60.5-  64.4 90 

64.5-  68.4 150 

68.5-  72.4 120 

72.5-  76.4 131 

76  5-  80.4 109 

80.5-  84.4 86 

84.5-  88.4 62 

88.5-  92.4 42 

92.5-  96.4 15 

96.5- 100.4 18 

100.5- 104.4 9 

104.5- 108.4 5 

108.5- 112.4 3 

112.5- 116.4 3 


Total 924 

A word  should  be  said  about  the  designation  of  the  class  limits 

in  the  first  column  of  Table  54.  The  pulse  rates,  as  actually 

* Whiting,  M.  H.:  A Study  of  Criminal  Anthropometry,  Biometrika,  vol.  11, 
pp.  1-37,  1915. 
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recorded  by  the  physicians  who  took  the  data  originally,  which  went 
into  the  first  class  were  rates  of  45,  46,  47,  and  48  beats  per  minute. 
But  looking  at  the  matter  from  the  viewpoint  of  exact  measurement 
a physician’s  record  of  45  beats  per  minute  really  includes  on  the 
average  all  those  rates  which,  with  precise  physical  instruments  for 
timing  and  recording  beats,  would  fall  between  44.500  . . . beats 
and  45.499  . . . beats  per  minute.  Consequently  the  class  limits 
are  set  down  in  the  way  shown  in  Table  54. 


Fig.  78. — Histogram  showing  frequency  distribution  of  variation  in  pulse  beats  per 

minute  in  English  convicts.  (Data  of  Table  54.) 

This  distribution  shows  in  a rather  typical  manner  the  general 
characteristics  of  frequency  distributions  of  variation,  or  variation 
curves,  as  they  may  briefly,  if  less  precisely,  be  called.  We  see  the 
“cocked  hat”  shape,  with  which  we  became  familiar  in  Chapter  XI, 
indicating  that  the  most  frequent  occurrence  of  variates  is,  in 
general,  near  the  middle  of  the  distribution.  Toward  the  ends  the 
frequency  becomes  smaller  and  smaller  till  it  disappears.  The 
distribution  has  but  a single  peak.  It  might  be  thought,  at  first 
inspection,  that  there  were  two  peaks,  one  on  the  class  64.5-68.4, 
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and  the  other  on  the  class  72.5-76.4  beats  per  minute.  But  the 
depression  on  class  68.5-72.4,  which  gives  rise  to  the  impression 
of  two  peaks,  is  not  significantly  different  from  the  frequency  on 
the  classes  to  either  side  of  it,  having  regard  to  probable  errors, 
and  consequently  means  nothing.  It  is,  in  fact,  merely  a result  of 
random  sampling.  How  do  we  know  this? 

If,  of  N values,  Ni  lie  below  X and  A5>  above  it,  the  probable 
error  of  Ni  or  7V2  is 

=«=  .67449 

It  is  an  even  chance  that  N times  the  true  proportion  of  values 

below  X lies  between  Ni  -f-  .67449  ^ 'V'Y^ 2 and  Ni  — .67449  a 

(Cf.  Sheppard,  Biometrika,  Vol.  II,  p.  178.)  So  then  we  have  for 
the  data  of  Table  54  the  results  shown  in  Table  55. 


TABLE  55 

Probable  Errors  of  Frequencies 


X 

Ni 

X 2 

P.  E. 

X 

Ni 

N2 

P.  E. 

44.4 

0 

924 

84.4 

767 

157 

■ ± 7.7 

48.4 

2 

922 

± 0.95 

88.4....  . 

829 

95 

± 6.2 

52.4 

7 

917 

± 1.8 

92.4 

871 

53 

± 4.8 

56.4 

24 

900 

=t  3.3 

96.4 

886 

38 

=*=  4.1 

60.4. . 

81 

843 

± 5.8 

100.4. . 

904 

20 

± 3.0 

64.4 

171 

753 

± 8.0 

104.4 

913 

11 

=t=  2.2 

68.4. . 

321 

603 

=t  9.8 

108.4. . 

918 

6 

± 1.6 

72.4. . 

441 

483 

=■=  10.2 

112.4. . 

921 

3 

± 1.2 

76.4 

572 

352 

± 10.0 

116.4 

924 

0 

80.4 

681 

243 

± 9.0 

N 1 N2 

X N 


We  thus  see  that  in  the  region  from  64.5  to  76.4  pulse  beats  per 
minute  the  probable  error  of  the  frequencies  is  about  10.  None  of 
the  differences  between  neighboring  frequencies  is  of  the  order  of 
4x10  = 40,  which  would  have  to  be  the  case  to  make  any  deflec- 
tion in  this  region  of  the  curve  significant. 

CALCULATION  OF  MOMENTS 

Having  in  this  way  satisfied  ourselves  that  we  are  dealing  with 
an  essentially  unimodal  curve,  we  may  proceed  to  its  analysis,  to 
the  end  that  we  may  have  quantitative  expressions  of  the  charac- 
teristic features  of  variation  in  pulse- rate.  The  first  step  in  the 
mathematical  analysis  of  any  frequency  distribution  is  to  calculate 
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certain  quantities  known  in  theoretic  mechanics  as  “moments  of 
inertia.”  The  arithmetic  of  this  process  for  our  pulse  rate  example 
is  set  forth  in  Table  56.  We  shall  first  calculate  the  moments  about 
an  arbitrary  origin,  at  the  lower  range  end,  and  then  later  transfer 
to  the  mean  or  center  of  gravity  of  the  distribution.  The  first  steps 
in  the  calculation  are  shown  in  Table  56. 


TABLE  56 

Calculation  of  Moments 


Midpoint  of 
pulse-rate 
class. 

Frequency 

Z. 

x 

Deviation 
from  origin 
in  class 
units. 

Zx 

Zx2 

Zx 3 

Zx4 

46.5 

2 

0 

0 

0 

0 

0 

50.5 

5 

1 

5 

5 

5 

5 

54.5 

17 

2 

34 

68 

136 

272 

58.5 

57 

3 

171 

513 

1,539 

4,617 

62.5 

90 

4 

360 

1,440 

5,760 

23,040 

66.5 

150 

5 

750 

3,750 

18,750 

93,750 

70.5 

120 

6 

720 

4,320 

25,920 

155,520 

74.5 

131 

7 

917 

6,419 

44,933 

314,531 

78.5 

109 

8 

872 

6,976 

55,808 

446,464 

82.5 

86 

9 

774 

6,966 

62,694 

564,246 

86.5 

62 

10 

620 

6,200 

62,000 

620,000 

90.5. 

42 

11 

462 

5,082 

55,902 

614,922 

94.5 

15 

12 

180 

2,160 

25,920 

311,040 

98.5 

18 

13 

234 

3,042 

39,546 

514,098 

102.5 

9 

14 

126 

1,764 

24,696 

345,744 

106.5 

5 

15 

75 

1,125 

16,875 

253,125 

110.5 

3 

16 

48 

768 

12,288 

196,608 

114.5 

3 

17 

51 

867 

14,739 

250,563 

Totals 

924 

6399 

51,465 

467,511 

4,708,545 

For  the  moments  about  the  arbitrary  origin  at  a pulse-rate  of 
46.5,  we  have,  5 denoting  summation. 


D = 

v2  = 

j;3  = 


V4  = 


5 (Zx) 

6399 

S (Z) 

~ 924  “ 6 

S (Zx2) 

51,465 

S (Z) 

924 

S (Zx3) 

467,511 

S (Z) 

924 

S (Zx*) 

4,708,545 

s (Z) 

924 

- 55.698052 


= 505.964286 


= 5095.827922 


Since  we  shall  have  to  use  powers  of  these  quantities  in  the 
subsequent  calculations,  it  will  be  well  to  keep  six  places  of  decimals 
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for  the  present,  in  order  to  ensure  the  degree  of  arithmetical  accuracy 
we  shall  want  at  the  end.  Keeping  the  decimals  at  this  stage  has 
nothing  whatever  to  do  with  the  accuracy  or  reliability  of  the 
original  data.  It  is  a purely  arithmetical  matter. 

The  next  step  is  to  determine,  from  these  moments  about  the 
lower  range  end  as  origin,  the  values  of  the  moments  about  the 
mean.  Letting  t denote  a moment  about  the  mean,  and  observing 
that  *q,  vZj  and  denote  moments  about  any  arbitrary  origin, 
we  have 

tti  = 0 (by  definition  of  the  mean) 

^2  = V2  ~ ^l2 

= VS  — 3vxV2  + 2Vi3 

7T4  = V4  — 4:ViV3  -f-  6V1‘2V2  — 3v 44 

It  should  be  understood  that  the  above  equations  for  the  tt’s 
are  valid  regardless  of  the  origin  about  which  the  Fs  are  taken.  In 
the  particular  example  shown  in  Table  56  the  origin  was  taken  at 
the  center  of  the  class  marking  the  lower  observed  range  end.  But 
the  equations  for  the  7r’s  given  above  would  be  equally  valid  if  the 
origin  had  been  taken  at  the  upper  range  end,  or  somewhere  in 
the  middle. 

For  the  pulse -rate  example  we  have: 

t2  = 55 ..698052  - 47.960126  - 7.737926 

T3  = 505.964286  - 1157.181336  + 664.278920  = 13.061870 

tt4  = 5095.827922  - 14015.868476  + 16027.713552  - 6900.521118  = 207.151880 

To  the  values  of  the  moments  given  above  it  is  necessary  to 
make  certain  corrections,  to  allow  for  the  fact  that  individual  ob- 
servations have  been  grouped  in  forming  the  frequency  distribution. 
The  corrections  generally  used,  called  after  their  discoverer,  Shep- 
pard’s corrections,  are  applicable  when,  as  in  our  present  example, 
the  curve  has  reasonably  high  contact  at  both  ends  of  the  range. 
For  corrections  of  the  moments  of  entirely  general  applicability 
see  Biometrika,  Vol.  12,  pp.  231-258.  Using  n to  designate  a cor- 
rected moment  about  the  mean  as  origin,  Sheppard’s  corrections  are: 

ih  = 0 

^2  = TT  2 — j3 2 • ( 12  = .083333) 

lh  = ^3 

«4  — 7i" 4 — 1tt.2  4-  LL  = .029167) 
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We  then  have,  from  the  pulse-rate  example, 

= 7.654593 

fiz  = 13.061870 
Hi  = 203.312084 


Besides  the  moments  themselves,  we  shall  need  two  simple 
functions  of  them,  viz.: 


0i  = 


!H2 

u2*’ 


For  the  pulse-rate  example  these  have  values  as  follows: 


, 170.612448 

1 1 ~ 448.503991 


.380403 


203.312084 

58.592794 


3.469916 


With  the  moments  of  the  distribution  in  hand,  the  foundation 
is  laid  for  the  determination  of  the  various  physical  constants  which 
define  and  describe  the  several  aspects  of  the  phenomenon  of 
variation.  These  constants  may  conveniently  be  divided  into 
three  groups  as  follows: 

(1)  Constants  defining  the  type  or  center  of  variation. 

(2)  Constants  measuring  dispersion  or  degree  of  variation. 

(3)  Constants  measuring  the  shape  of  the  variation  curve. 


CONSTANTS  DEFINING  THE  TYPE  OR  CENTER  OF  VARIATION 

The  first  thing  one  wishes  to  know,  when  considering  variation 
philosophically,  is  something  about  the  central  or  typical  condition, 
about  which  the  variation  groups  itself.  There  are  three  constants 
commonly  used  to  define  different  aspects  of  type,  and  together 
they  give  a sufficient  picture  of  the  central  or  typical  condition. 
They  are  the  mean,  the  median,  and  the  mode. 

The  Mean 

The  arithmetic  mean  or  average  is  mechanically  the  center  of 
gravity  of  the  frequency  distribution.  If  the  histogram  of  Fig.  78 
were  cut  out  of  sheet  metal  of  uniform  thickness,  and  then  exactly 
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balanced  on  a knife  edge  set  at  right  angles  to  the  base  line  or  a axis, 
the  point  where  the  knife  edge  intersected  the  base  would  be  the 
average  or  mean  number  of  pulse  beats  per  minute  of  the  group  of 
924  observations  included  in  the  distribution.  This  being  so,  it 
will  be  readily  perceived  from  the  most  elementary  mechanical 
principles,  the  frequencies  being  regarded  as  masses  concentrated 
at  the  midpoints  of  the  class  sub-ranges  on  the  x axis,  that  the 
mean  must  be  distant  from  the  arbitrary  origin,  about  which  the 
first  raw  moments  are  taken,  by  the  amount  of  vx. 

Thus  in  the  pulse-rate  example  we  have: 


Pulse  beats  at  point  of  arbitrary  origin  = 46.5 

Number  of  class  units,  from  origin  to  mean  (yx)  — 6.925 

Number  of  pulse  beats  per  class  unit  = 4 

Number  of  pulse  beats  from  origin  to  mean  = 27 .700 

Mean  number  of  pulse  beats  74.200 


The  probable  error  of  the  mean,  when  n the  number  of  observa- 
tions (S  (Z)  in  the  notation  used  in  our  example)  is  15  or  more,  is 

P.  E.  Mean  = =±=  °- 

where  yi  = • 6744898/ V N , and  is  tabled  in  Pearson’s  “Tables  for 
Statisticians  and  Biometricians.”6  cr  is  the  standard  deviation, 
a constant  already  encountered  in  Chapter  XII  and  further  dis- 
cussed below.  When  a mean  or  average  is  based  upon  less  than 
15  observations,  the  paper  of  “Student”3  should  be  consulted  for 
the  method  of  procedure  to  determine  the  reliability  of  the  mean. 

In  our  present  case  we  have 

Mean  pulse-rate  = 74.200  =±=  .246  beats  per  minute. 

The  Median 

The  median  is  the  value  of  the  varying  character  (i.  e.,  the  point 
on  the  .v  axis)  above  and  below  which  exactly  50  per  cent,  of  the 
variates  fall.  In  our  present  example  462  (i.  e.,  J of  924)  pulse-rate 
observations  fall  below  the  median  value,  and  462  above  it. 

The  arithmetic  of  determining  the  median  is  most  simple.  It 
can  best  be  illustrated  by  example.  We  have  seen  in  Table  55  that 
441  observations  show  pulse  beats  of  72.4  per  minute  or  less.  One- 
half  of  all  observations  is  462.  Therefore  it  is  clear  that  the  median 
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value  must  fall  somewhere  in  the  72.5  — 76.4  class,  and  the  distance 
into  that  class  where  it  falls  is  evidently  in  the  proportion  which 
462  — 441  = 21  is  to  the  whole  frequency  in  that  class,  which  is 
131.  So  then  what  is  needed  is  to  determine  what  21/131  of  4 
pulse  beats  is,  4 beats  being  the  class  unit.  This  equals  0.641 
pulse  beat.  Consequently,  the  median  is  72.5  + .641  = 73.141 
beats  per  minute. 

We  should,  of  course,  get  the  same  result  if  we  count  down 
from  the  upper  range  end  to  determine  the  median,  as  we  get  if 
we  count  \ N up  from  the  lower  range  end,  as  was  done  in  the  pre- 
ceding paragraph.  This  is  in  fact  so.  On  our  example  the  fre- 
quency from  the  upper  end  down  to  76.4  (Table  55)  is  352.  That 
is,  there  are  352  individuals  with  pulse  rates  of  76.5,  or  above. 
jN  — 352  = 110.  110/131  of  4 pulse  beats  equals  3.359.  76.5  — 
3.359  = 73.141,  which  is  the  same  value  that  was  obtained  in  work- 
ing from  the  other  range  end. 

It  is  to  be  noted  that  the  median  is  smaller  than  the  mean,  i.  e. , 
lies  to  the  left  of  it  in  the  distribution.  This  means  that  the  curve 
as  a whole  is  asymmetric  or  skew  toward  the  right  end  or  large 
values  of  the  pulse-rate.  We  shall  return  to  this  point  later. 

The  probable  error  of  the  median  is: 

P.  E.  Median  = 1 .25332  X P.  E.  mean. 

So  we  have  for  a final  result 

Median  pulse-rate  = 73.141  =*=  .308  beats  per  minute. 

The  Mode 

The  mode  is  the  value  of  the  varying  character  which,  in  the 
theoretic,  true  variation  curve,  exhibits  the  maximum  frequency 
of  occurrence.  Owing  to  the  probable  errors  of  individual  fre- 
quencies arising  from  random  sampling,  to  which  attention  has 
already  been  called,  the  true  mode  may  not  coincide  exactly  with  the 
most  frequent  class  in  the  observed  distribution.  This  means 
merely  that  the  particular  observed  sample  with  which  we  are 
dealing  has,  by  chance,  a particular  class  near  the  center  of  the 
distribution  occurring  more  frequently  than  it  should,  in  relation 
to  all  the  other  frequencies  in  the  distribution.  Mathematically, 
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the  mode  is  the  point  on  the  theoretic  curve  which  graduates  the 
observations,  where  ^ = 0. 

The  mode  is  distant  from  the  mean  by  a quantity 

d = x X o- 

X is  a constant  called  the  skewness , and  obviously  is  the  fraction 
which  the  modal  distance  d is  of  the  standard  deviation  cr,  since 

d 

x = — . 

cr 

The  equation  for  the  skewness  x 'm  terms  of  the  moment  coeffi- 
cients will  be  given  in  a later  section  (p.  357).  It  should  be  expressly 
noted  that  this  x (skewness)  is  not  the  same  thing  as  the  xi  dis- 
cussed above.  Then 

Mode  = Mean  — d 

The  probable  error  of  the  modal  distance  d,  in  the  general  case, 
may  be  found  from  Table  40  in  Pearson’s  “Tables  for  Statisticians 
and  Biometricians.”  For  most  practical  statistical  purposes  what 
one  wishes  to  know  is  whether  d is  significantly  different  from  zero, 
i.  e.,  whether  the  mode  is  separated  from  the  mean  by  an  amount 
greater  than  might  probably  have  arisen  by  chance.  In  the  nor- 
mal or  Gaussian  curve,  which,  as  we  have  seen,  is  a symmetric 
unimodal,  “cocked  hat”  curve  having  the  equation 

N *2 


the  mean  and  the  mode  coincide,  or  d = 0,  with  a probable  error  of 


p.  E. 


d (normal  curve) 


= ± .67449 


o. 


Consequently,  unless  d amounts  to  three  or  four  times  this  probable 
error,  the  mode  cannot  be  regarded  as  significantly  different  from 
the  mean. 

In  our  present  example  we  have 


d = .3289  X 11.0668  = 3.640 
P*  E' d (n.  c.)  = ^ °-30L 
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We  see  that  d is  more  than  ten  times  as  large  as  the  probable 
error.  Hence  we  may  conclude  that  the  point  of  maximum  fre- 
quency in  the  variation  curve,  the  mode,  is  significantly  different 
from  the  mean.  The  value  of  the  mode  is 

Mode  = 74.200  — 3.640  = 70.560  beats  per  minute. 

CONSTANTS  MEASURING  DISPERSION  OR  DEGREE  OF  VARIATION 

o _ 

After  having  defined  and  measured  the  typical  condition  about 
which  variation  is  occurring,  the  next  thing  wanted  is  a measure 
of  the  degree  or  extent  of  the  variation  itself.  In  absolute  terms  the 


Fig.  79. — Frequency  polygons  showing  variation  in  infant  mortality  rate  in  1918 
of  (a)  the  white  population  and  (b)  the  colored  population  of  rural  counties. 


best  measure  of  variation  will  be  one  which  describes  with  precision 
the  extent  of  the  “scatter”  of  the  variates  about  the  mean.  If 
values  of  the  varying  character  widely  different  from  the  mean  or 
typical  condition  are  found  to  occur  with  considerable  frequency, 
it  is  common  sense  to  say  that  the  character  shows  a high  degree  of 
variation.  In  general,  the  more  scattered  the  variates  away  from 
the  typical  condition,  the  more  variable  is  the  character  and  vice 
versa. 

Thus  from  Fig.  79  it  is  apparent  that  the  infant  mortality  rate 
in  rural  areas  varies  much  more  in  the  colored  than  in  the  white 
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population.  The  broken  line  polygon  is  much  more  “scattered” 
or  spread  out  than  the  solid  line  one. 

THE  STANDARD  DEVIATION 

The  constant  which  has  been  adopted  by  biometricians  to 
measure  in  absolute  terms  the  degree  of  scatter  or  dispersion  of  the 
variates  is  called  the  standard  deviation.  It  is  the  same  quantity 
which  in  theoretic  mechanics  is  called  the  radius  of  gyration.  It 
is  a parameter  of  the  variation  curve,  representing  a distance  on 
the  x axis  such  that  if  the  total  frequency  were  concentrated  at  that 
point  and  connected  by  a rigid  bar  with  the  mean,  the  system  would 
have  the  same  rotational  properties  about  the  mean  in  a frictionless 
medium  as  would  the  whole  distribution  in  its  actual  form  if  it  were 
rotated  in  the  same  medium  about  the  mean  as  an  axis.  Roughly, 
three  times  the  standard  deviation  on  either  side  of  the  mean  will 
include  all  the  variates,  as  is  shown  in  Fig.  76,  Chapter  XI.  This 
is  the  same  quantity  which  in  the  discussion  of  the  point  binomial 
was  called  <r  = \/n  p q. 

The  calculation  of  the  standard  deviation  is  done  from  the  fol- 
lowing simple  relation,  a denoting  the  standard  deviation. 


The  probable  error  of  a,  in  distributions  of  15  or  more  indi- 
viduals, is 


P.  E.o-  = =fc  X2°, 


where  %2  = .67449/V2 N,  and  is  tabled  in  Pearson’s  “Tables  for 
Statisticians  and  Biometricians.”  Where  the  distribution  contains 
fewer  than  15  individuals  the  same  caution  should  be  observed 
in  judging  its  reliability  as  has  been  emphasized  for  the  mean 
above. 

For  our  pulse-rate  example  we  have 

<r  - -yT • 654593  = 2.766694 


in  units  of  grouping. 
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The  unit  of  grouping  is  4 pulse-beats  per  class.  Whence 

S.  D.  = 4 x 2.7667  = 11.067  ± .174 

pulse-beats  per  minute. 


THE  COEFFICIENT  OF  VARIATION 


Since  the  standard  deviation  measures  degree  of  variation  in 
concrete  units,  inches,  pounds,  beats,  degrees,  or  whatever  unit 
the  varying  character  is  measured  in,  it  is  evident  that  its  utility 
for  comparative  purposes  is  much  restricted.  One  cannot  directly 
compare  inches  and  degrees  of  temperature.  Obviously,  there  is 
needed  some  comparative  or  relative  measure  of  variation,  which 
will  make  it  possible  to  discuss  whether,  for  example,  men  are  more 
or  less  variable  in  respect  of  the  weight  of  the  brain  than  in  respect 
of  pulse-rate.  Such  a relative  measure  is  furnished  by  the  constant 
called  the  coefficient  of  variation.  It  expresses  the  standard 
deviation  as  a percentage  of  the  mean.  Symbolically  we  have 


C.  of  V. 


100  <t 
Mean’ 


The  probable  error  of  the  coefficient  of  variation  is 

P.  E.CV.  = ± .67449  { 1 + 2 } ‘ = * X * 

where  both  %2  and  \p  are  quantities  tabled  in  Pearson’s  “Tables  for 
Statisticians  and  Biometricians.”  Some  caution,  which  will  be, 
and  can  only  be,  acquired  by  experience,  needs  to  be  used  in  inter- 
preting coefficients  of  variation.  In  general,  one  should  always 
remember  that  this  constant  simply  measures  the  degree  of  scatter 
of  the  distribution  in  relation  to  the  mean  value  of  the  thing  varying. 
Usually  such  a relation  has  real  and  significant  meaning,  but  some- 
times it  does  not  for  reasons  inherent  in  the  facts  themselves. 
While  space  will  not  permit  of  going  into  details  here,  it  may  be 
pointed  out  that  one  source  of  the  difficulty  referred  to  arises 
from  the  consideration  that  the  mean  and  the  standard  deviation 
are  correlated.  We  have 
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where  rh/u2  denotes  the  coefficient  of  correlation  between  mean 
and  second  moment,  and  <j/z  is  the  standard  deviation  of  the  mean, 
and  cr^  the  standard  deviation  of  the  second  moment. 

In  our  present  example  the  coefficient  of  variation  is 


C.  V. 


11.0668  X 100 
74.200 


= 14.915  ± .239  per  cent. 


It  is  of  considerable  interest  to  see  how  this  value  measuring 
the  comparative  variability  of  pulse-rate  compares  with  coefficients 
for  variation  in  other  characters  of  medical  interest.  To  this  end 
Table  57  has  been  inserted.  This  gives,  in  descending  order, 
coefficients  of  variation  for  a wide  range  of  physiologic,  ana- 
tomic, and  pathologic  characteristics.  These  records  are  taken 
from  the  general  literature  of  biometry. 


TABLE  57 

Coefficients  of  Variation  for  Man 


cf 

> 

Weight  of  spleen  (General  Hospital  population)1 

. . 50 

.58 

Steadiness  of  hand  (English)17 

48.54-69.59 

Visual  acuity  (English)15 

. . 39. 

12 

Weight  of  spleen  (healthy)2 

..  38. 

21 

Dermal  sensitivity3 

. . 35. 

70 

45. 

70 

Weight  of  heart  (General  Hospital  population!1 

. . 32. 

39 

Interlabral  height  (American),  white9 

. . 32. 

15 

Keenness  of  sight3 

..  28. 

68 

32. 

21 

Strength  of  grip,  all  ages,  left  hand16 

..  26. 

85 

Strength  of  grip,  all  ages,  right  hand16 

..  25. 

93 

Thyroid,  area  (English,  age  13,  normal  thyroids)11 

25. 

2 

Thyroid,  area  (English,  age  13,  definite  goiters)11 

24. 

9 

Weight  of  kidneys  (General  Hospital  population)1 

. . 24. 

63 

Rapidity  of  hand,  females  only17 

23.91- 

-29.49 

Interlabral  height  (American),  negro9 

..  23. 

42 

Body  weight  (Bavarians)20 

. . 21. 

,32 

24. 

715 

Weight  of  liver  (General  Hospital  population)1 

..  21. 

.12 

Swiftness  of  blow3 

. . 19. 

4 

17. 

1 

Reaction  time  to  sound  (English) 

. . 19. 

14914 

20. 

2013 

Reaction  time  to  sight  (English) 

. . 19. 

. 01614 

19. 

0213 

Nasal  depth  (American),  negro9 

..  18. 

.34 

Intelligence  quotient,  both  sexes18 

. . 18. 

01 

18. 

01 

Vital  capacity  (English,  corrected  for  age)15 

. . 17. 

.904 

Respiration  rate  per  minute19 

. . 17. 

80 

Weight  of  heart  (healthy)2 

. . 17. 

71 

Weight  of  kidneys  (healthy)2 

. 16. 

80 

Breathing  capacity3 

. . 16. 

6 

20. 

4 

Strength  of  left  hand  grip  (English,  age  corrected)16 .... 

. . 16. 

27 

Auditory  acuity15 

. . 15. 

84 

Thyroid,  breadth  (English,  age  13,  definite  goiters)11.  . . 

15. 

5 

Strength  of  right  hand  grip  (English,  age  corrected)16.  . . 

..  15. 

43 

Strength  of  pull3 

. . 15. 

0 

19. 

3 

Pulse-rate  per  minute19 

. . 14. 

89 
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Coefficients  of  Variation  for  Man — Continued 

cT  9 

Weight  of  liver  (healthy)2 14.80 

Nasal  depth  (American),  white9 14.66 

Thyroid,  breadth  (English,  age  13,  normal  thyroids)11 14.6 

Body  weight  (American,  active  tuberculosis)10 13.69 

Body  weight  (American,  age  20-49  years,  normal  health)10  13. 16 

Pigmentation  (American),  white9 • 12.94 

Body  weight  (American,  arrested  tuberculosis)10 12.78 

Internipple  breadth  (American),  white9 12.01 

Height  of  mandible  (English,  both  sexes)4 11.73  11.73 

Blood,  relative  cell  volume  (American,  active  tuberculosis)10  11.13 

Lower-nasal  breadth  (American),  white9 10.53 

Body  weight  (English)3 10.37  13.37 

Skull  capacity  (Etruscan)5 9.58  8.54 

Chest  circumference  (American),  negro9 9.45 

Skull  capacity  (Australians)12 9.27  6.98 

Brain  weight  (French)3 9.16  9.14 

Internipple  breadth  (American),  negro9 8.93 

Mouth  breadth  (American),  white9 8.69 

Pigmentation  (American),  negro9 8.68 

Lower-nasal  breadth  (American),  negro9 8.67 

Chest  circumference  (American),  white9 8,45 

Skull  capacity  (modern  Italian)5 8.34  8.99 

Skull  capacity  (English)6 8.28  8.68 

Skull  capacity  (Egyptian  mummies)5 8.13  8.29 

Brain  weight  (Bavarian)20 8.118  8.340 

Brain  weight  (Hessian)20 8.096  8.125 

Chest  breadth  (American) , white9 8.06 

Ventral  torso  length  (American),  white9 8.02 

Skull  capacity  (Egyptians)12 7.89  7.08 

Brain  weight  (Bohemian)20 7.809  7.382 

Skull  capacity  (modern  German)5 7.74  8.19 

Skull  capacity  (Naqada)5 7.72  6.92 

Brain  weight  (Swedish)20 7.592  8.043 

Skull  capacity  (Parisian,  French)5 7.36  7.10 

Mouth  breadth  (American),  negro9 7.17 

Skull  capacity  (Aino)5 7.07  6.90 

Chest  breadth  (American),  negro9 6.73 

Upper  arm  length  (American),  negro9 6.42 

Mandible,  distance  between  foramina  mentalia  (English, 

both  sexes)4 6.23  6.23 

Head  neck  length  (American),  negro9 6.20 

Hand  length  (American),  negro9 6.15 

Blood,  relative  cell  volume  (American,  arrested  tuber- 
culosis)10  5.73 

Arm  length  without  hand  (American),  negro9 5.59 

Blood,  relative  cell  volume  (American,  age  20-49,  normal 

health)10 5.42 

Length  of  forearm8 5.24  5.21 

Entire  arm  length  (American),  negro9 5.16 

Length  of  femur  (French)3 5.05  5.04 

Length  of  tibia  (French)3 ..  4.975  5.365 

Hand  length  (American),  white9 4.97 

Upper  arm  length  (American),  white9 4.93 

Length  of  humerus  (French)3 4.89  5.61 

Head  neck  length  (American),  white9. .' 4.88 

Length  of  radius  (French)3 4.87  5.23 

Skull,  height  to  breadth  index  (English)6 4.86  4.16 

Skull,  breadth  to  height  index  (English)6 4.83  4.17 
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d”  9 

Ventral  torso  length  (American),  negro9 4.81 

Length  of  finger  (English  criminals)7 4.74 

Skull,  ratio  of  height  to  horizontal  length  (English)6 4.61  4.10 

Length  of  foot  (English)7 4.59 

Skull,  cephalic  index  for  horizontal  length  (English)6 4.38  3.99 

Length  of  cubit  (English  criminals)7 4.36 

Skull,  least  breadth  of  forehead  (English)6 4.29  4.55 

Skull,  height  (English)6 4.21  3.96 

Arm  length  without  hand  (American),  white9 4.20 

Skull,  length  of  base  (English)6 4.07  4.11 

Stature  (English)8 3.99  3.83 

Entire  arm  length  (American),  white9 3.97 

Skull,  cephalic  index  for  greatest  length  (English)6 3.95  4.03 

Stature  (American,  active  tuberculosis)10 3.86 

Skull,  ratio  of  height  to  greatest  length  (English)6 3.80  4.21 

Stature  (American,  arrested  tuberculosis)10 3.77 

Skull,  greatest  breadth  (English)6 3.75  3.54 

Skull,  auricular  height  (English)6 3.73  4.12 

Skull,  face  breadth  (English  criminals)7 3.707 

Skull,  cross  circumference  (English)6 3.70  3.97 

Skull,  sagittal  circumference  (English)6 3.63  3.90 

Stature  (American,  age  20-49  years,  normal  health)10 3.60 

Head,  breadth  (English  criminals)7 3.333 

Skull,  length  (English)6 3.31  3.45 

Head,  length  (English  criminals)7 3.154 

Skull,  horizontal  circumference  (English)6 2.87  2.92 

Oral  temperature19 0.49 


1 Greenwood,  M.:  Biometrika,  3,  66,  1904. 

2 Ibid.,  p.  67. 

3 Pearson,  Karl:  The  Chances  of  Death,  vol.  1,  293. 

4 Macdonell,  W.  R.:  Biometrika,  3,  225,  1904. 

5 Ibid.,  p.  221. 

6 Ibid.,  p.  222. 

7 Macdonell,  W.  R.:  Biometrika,  1,  202,  1901-02. 

8 Pearson,  Karl,  and  Lee,  Alice:  Biometrika,  2,  370,  1902-03. 

9 Todd,  T.  W.:  Human  Biology,  1,  65,  1929. 
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THE  GRAPHIC  REPRESENTATION  OF  RELATIVE  VARIABILITY 

It  has  been  the  generally  accepted  biometric  practice  to  use  the 
coefficient  of  variation  just  discussed  as  the  measure  of  the  relative 
variability  or  scatter  of  frequency  distributions.  This  constant  is, 
as  we  have  seen, 

r r -rr  _ 100  (standard  deviation) 


Mean 
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It  gives  the  standard  deviation  of  the  distribution  in  terms  ot 
the  mean  value  of  the  varying  character.  By  expressing  the 
scatter  of  the  distribution  in  this  way  it  becomes  possible  to  com- 
pare the  relative  variabilities  of  characters  measured  in  different 
absolute  units. 

But  the  coefficient  of  variation  has  never  been  an  entirely  satis- 
factory constant,  to  biologists  at  least.  While  formally  correct 
enough,  within  the  limits  of  its  definition,  it  does  not  readily  or 
instantly  call  up  in  the  mind  an  adequate  picture  of  the  real  degree 
of  scatter  of  the  distribution.  This  is,  in  part  at  least,  because  two 
things,  the  mean  and  the  standard  deviation,  are  involved  in  it. 
When  one  reads  the  value  of  the  standard  deviation  of  a particular 
distribution  it  is  recalled  that  roughly  three  times  this  quantity  on 
either  side  of  the  mean  includes  the  entire  frequency  and  this  gives 
at  once  some  concept  of  the  biological  extent  and  meaning  of  the 
variation,  in  the  particular  case. 

There  would  seem  to  be  a place  of  usefulness  for  an  adequate 
graphical  method  of  depicting  relative  variability  for  comparative 
purposes,  so  that  one  may  see  the  difference  or  likeness  in  the  varia- 
tion of  a man  and  a mouse,  for  example,  in  respect  of  body- weight. 
It  is  the  purpose  of  this  section  to  describe  such  a graphic  method, 
and  to  illustrate  its  applications. 

The  method  may  best  be  approached  through  a concrete  illus- 
trative example.  A study9  was  made  of  the  normal  variation  and 
correlation  of  the  relative  cell  volume  of  human  blood,  in  relation  to 
age,  body-weight  and  stature.  The  present  situation  regarding  the 
measurement  and  graphical  depiction  of  variation  in  these  four 
characters,  in  a series  of  272  normal  males,  is  fairly  exhibited  in 
Table  58  and  Figs.  80  to  82. 


TABLE  58 

Variation  Constants 


Character. 

Mean. 

Standard 

deviation. 

Coefficient  of 
variation  (per  cent.). 

(a)  Age 

30.59  ± .21  yrs. 

5.22  ± .15  yrs. 

17.06  ± .51 

( b ) Body-weight 

151.56  ± .82  lbs. 

19.95  ± .58  lbs. 

13.16  ± .39 

(c)  Stature 

68.13  =±=  .10  in. 

2.45  ± .07  in. 

3.60  =*=  .10 

(d)  Relative  cell  volume 

45.59  ± .10  % 

2.47  ± .07  % 

5.42  ± .16 
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Fig.  80.— Histogram  showing  variation  in  body- weight  in  a group  of  272  normal  males. 


Fig.  81. — Histogram  showing  variation  in  stature  in  a group  of  272  normal  males. 


Fig.  82. — Histogram  showing  variation  in  relative  cell  volume  of  the  blood  in  a group 

of  272  normal  males. 
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Plainly  the  diagrams  (Figs.  80-82)  tell  nothing  whatever  about 
the  relative  or  comparative  variability  in  this  group  of  males  in 
respect  of  the  three  characters,  body-weight,  stature,  and  relative 
cell  volume.  They  are  correctly  plotted  histograms,  but  the  unit 
of  abscissal  measure  is  different  in  each  case  and  direct  comparison 
is  impossible. 

From  Table  58  we  learn,  through  the  coefficients  of  variation, 
that  the  group  is  from  three  to  five  times  more  variable  relatively 
in  respect  of  age  and  body-weight  than  it  is  in  respect  of  stature 
or  relative  cell  volume.  But  what  does  this  mean  translated  into 
terms  of  distribution  of  frequency?  A simple,  direct  and  easily 
interpreted  answer  is  not  forthcoming. 

Suppose  now  we  decide  to  express  the  age,  the  body-weight, 
the  stature  and  the  relative  cell  volume  of  each  of  these  272  indi- 
viduals as  a percentage  of  their  respective  mean  values , the  mean  of 
each  character  being  take  nas  100  per  cent.  And , further , suppose 
we  express  the  frequencies  as  respectively  so  much  per  1 per  cent,  of 
the  mean  of  each  character.  These  are  simple  and  entirely  per- 
missible transformations  of  the  original  data. 

The  data  in  their  original  form  and  after  the  transformation 
described  are  shown  in  Table  59. 

If  now  the  figures  in  the  columns  headed  A and  B in  Table  59 
be  plotted  on  arithmetically  ruled  co-ordinate  paper  we  shall  have 
a true  picture  of  the  relative  variability  of  the  four  characters 
considered.  This  is  done  in  Fig.  83.  Each  of  the  four  frequency 
polygons  has  the  same  area,  as  a result  of  the  transformations 
effected  in  the  B columns. 

This  method  of  plotting  superimposes  the  different  polygons  of 
variation  on  a common  Cartesian  co-ordinate  grid,  with  the  mean 
value  for  each  of  the  compared  variables  at  the  same  abscissal 
point.  It  constitutes  no  new  method  of  measuring  biological  varia- 
tion, but  merely  visualizes  effectively  what  the  coefficient  of  varia- 
tion measures. 

The  method  of  plotting  used  in  Fig.  83  shows  at  a glance  that 
the  272  men  of  this  group  differ  among  themselves  far  more  widely 
in  respect  of  age  and  body-weight  than  they  do  in  respect  of  stature 
or  relative  cell  volume.  The  variation  polygon  for  stature  shows 
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the  least  scatter.  That  for  relative  cell  volume  is  somewhat,  but 
not  greatly,  more  spread.  Those  for  age  and  body-weight  are  wide, 
flat  distributions,  indicating  a relatively  high  variation  in  the  group 
in  respect  of  these  characters. 

One  more  example  will  be  given.  What  is  the  comparative 
individual  variability  of  cows  in  respect  of  milk  production  and 
of  hens  in  respect  of  egg  production?  Table  60  gives  the  necessary 
data  regarding  (a)  milk  yield  in  gallons  per  week  in  three-year-old 


Fig.  83. — Superimposed  variation  polygons  for  (1)  relative  cell  volume,  (2)  stature,  (3) 
body-weight,  and  (4)  age,  in  272  normal  males.  See  text  for  further  explanation. 


Ayrshire  cows  (combined  years  1908-09),*  and  (b)  annual  egg 
production  of  Barred  Plymouth  Rock  hens  (1905-06,  150  bird 
pens).f 

The  coefficients  of  variation  for  the  distributions  of  Table  60 
are  as  follows: 

Milk  yield:  C.  of  V.  = 17.690  ± .229. 

Egg  production:  C.  of  V.  = 31.72  =±=  1.00. 

* Pearl,  R.,  and  Miner,  J.  R.:  Variation  of  Ayrshire  Cows  in  the  Quantity  and 
Fat  Content  of  Their  Milk,  Jour.  Agr.  Research,  vol.  17,  pp.  285-322,  1919. 

f Pearl,  R.,  and  Surface,  F.  M.:  A Biometrical  Study  of  Egg  Production  in  the 
Domestic  Fowl.  I.  Variation  in  Annual  Egg  Production,  U.  S.  Dept.  Agr.  Bur. 
Anim.  Ind.  Bulletin  110,  Part  I,  pp.  1-80,  1909. 
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TABLE  60 


Milk  yield. 

Egg  production. 

Class  limits  in  gallons. 

Observed  absolute  fre- 
quency. 

Per  cent,  which  mid-  , 
point  is  of  mean. 

Absolute  frequency 
per  1 per  cent,  of  W 
mean. 

Per  mille  frequency 
per  1 per  cent,  of  O 
mean. 

Class  limits  (number 

of  eggs). 

Observed  absolute  fre- 

quency. 

Per  cent,  which  mid-  . 

point  is  of  mean. 

Absolute  frequency 

per  1 per  cent,  of  W 

mean. 

Per  mille  frequency 

per  1 per  cent,  of  O 

mean. 

6.50-  6.99 

2 

48.8 

0.6 

0.4 

0-  14 

1 

6.3 

0.08 

0.3 

7.00-  7.49 

6 

52.4 

1.7 

1.2 

15-  29 

1 

18.8 

0.08 

0.3 

7.50-  7.99 

7 

56.0 

1.9 

1.3 

30-  44 

4 

31.4 

0.32 

1.2 

8.00-  8.49 

7 

59.6 

1.9 

1.3 

45-  59 

10 

44.0 

0.80 

2.9 

8.50-  8.99 

5 

63.2 

1.4 

1.0 

60-  74 

21 

56.5 

1.67 

6.1 

9.00-  9.49 

24 

66.8 

6.6 

4.6 

75-  89 

23 

69.1 

1.83 

6.7 

9.50-  9.99 

28 

70.4 

7.8 

5.4 

90-104 

35 

81.6 

2.79 

10.1 

10.00-10.49 

35 

74.1 

9.7 

6.7 

105-119 

46 

94.2 

3.66 

13.3 

10.50-10.99 

56 

77.7 

15.5 

10.8 

120-134 

40 

106.8 

3.18 

11.6 

11.00-11.49 

68 

81.3 

18.8 

13.0 

135-149 

35 

119.3 

2.79 

10.1 

11.50-11.99 

70 

84.9 

19.4 

13.5 

150-164 

25 

131.9 

1.99 

7.2 

12.00-12.49 

107 

88.5 

29.6 

20.5 

165-179 

19 

144.4 

1.51 

5.5 

12.50-12.99 

118 

92.1 

32.7 

22.7 

180-194 

8 

157.0 

0.64 

2.3 

13.00-13.49 

124 

95.7 

34.3 

23.8 

195-209 

6 

169.6 

0.48 

1.7 

13.50-13.99 

119 

99.3 

32.9 

22.8 

210-224 

1 

182.1 

0.08 

0.3 

14  00-14.49 

133 

103.0 

36.8 

25.5 

14.50-14.99 

87 

106.6 

24.1 

16.7 

15.00-15.49 

102 

110.2 

28.2 

19.6 

15.50-15.99 

78 

113.8 

21.6 

15.0 

16.00-16.49 

76 

117.4 

21.0 

14.6 

16.50-16.99 

43 

121.0 

11.9 

8.3 

17.00-17.49 

43 

124.6 

11.9 

8.3 

17.50-17.99 

28 

128.2 

7.8 

5.4 

18.00-18.49 

20 

131.9 

5.5 

3.8 

18.50-18.99 

22 

135.5 

6.1 

4.2 

19.00-19.49 

14 

139.1 

3.9 

2.7 

19.50-19.99 

5 

142.7 

1.4 

1.0 

20.00-20.49 

6 

146.3 

1.7 

1.2 

20.50-20.99 

3 

149.9 

0.8 

0.6 

2 1 . 00-2 1 . 49 

2 

153.5 

0.6 

0.4 

21.50-21.99 

2 

157.1 

0.6 

0.4 

22.00-22.49 

160.8 

22.50-22.99 

i 

164.4 

0.3 

0.2 

— 

Totals. . . . 

1441 

— 

— 

275 

| •••■ 

— 

Using  the  data  as  given  in  columns  A and  C of  Table  60,  Fig.  84 
has  been  plotted.  The  transformation  of  the  absolute  frequencies 
per  1 per  cent,  of  the  means  given  in  the  B columns  to  the  relative 
or  per  mille  frequencies  of  the  C columns  is  necessary  in  order  to 
bring  the  two  polygons  to  the  same  area,  since  the  total  observed 
frequency  in  one  is  1441  and  in  the  other  only  275. 

The  greater  relative  variability  in  egg  production  is  ap- 
parent. 

This  method  of  exhibiting  relative  variability  on  an  accurately 
comparative  basis  may  be  summarized  in  the  following  formulas: 
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Let  A,  B,  and  C denote  the  figures  in  the  columns  so  headed  in 
Tables  59  and  60,  Then 


A = 
B = 

c = 


100  h 
M 
Mfd 

100 

1000  B 
N 


where  M is  the  mean,  h is  the  midpoint  of  a class  interval,  / is  the 
absolute  frequency  of  a class,  d is  the  factor  by  which  the  class 


Fig.  84. — Polygons  showing  the  relative  variability  of  cows  in  milk  yield  (solid  line), 
and  of  hens  in  egg  production  (dash  line).  For  further  explanation  see  text. 


interval  must  be  multiplied  to  reduce  it  to  unity  (i.  e.,  the  reciprocal 
of  the  class  interval),  and  N is  the  total  absolute  frequency. 


CONSTANTS  MEASURING  THE  SHAPE  OF  THE  VARIATION  CURVE 

The  Skewness 

So  far  as  any  a priori  reason  is  concerned,  it  is  obvious  that 
variation  curves  might  be  symmetric  about  the  mean  as  a center, 
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or  they  might  exhibit  any  degree  of  asymmetry,  or  skewness,  the 
variates  tailing  off  farther  and  more  gradually  on  one  side  of  the 
curve  than  on  the  other.  As  a matter  of  fact,  a wide  range  of 
asymmetry  is  found  in  the  variation  curves  of  actual  natural  phe- 
nomena. It  is  important  to  have  an  exact  measure  of  the  degree 
or  kind  of  asymmetry  exhibited  by  the  curve.  Such  a constant 
has  been  provided  by  Pearson  and  called  the  skewness.  Its  value, 
X denoting  skewness  is 

_ \/di  (A>  + 3) 

' ~ 2 (5ft  - 6ft  - 9)' 


The  larger  the  value  of  Ike  greater  is  the  departure  of  the  curve 
from  the  symmetric  “cocked  hat”  type.  The  sign  of  the  expres- 
sion which  indicates  the  direction  of  the  skewness  or  asymmetry, 
whether  toward  large  or  toward  small  values  of  the  variates,  is 
determined  generally  by  giving  to  V the  same  sign  as  that  of  /*3. 
There  are  certain  rare  types  of  curve  (J-shaped  or  U-shaped),  in 
which  this  rule  fails.  The  conventional  usage  as  to  the  direction  of 
the  skewness  is  as  follows : If  the  curve  is  skew  in  the  positive  direc- 
tion (x  +),  the  median  will  be  smaller  than  the  mean,  that  is  lie  to 
the  left  of  it  as  ordinarily  plotted,  and  the  curve  will  tail  off  more  on 
the  side  of  high  values.  If,  on  the  other  hand,  the  median  has 
larger  value  than  the  mean,  the  curve  is  negatively  skew  (x  — ) 
and  tails  off  more  on  the  side  of  low  values. 

In  the  case  of  the  normal  or  Gaussian  curve  x = b,  the  curve 
being  symmetric  about  the  mean.  The  probable  error  of  x f°r 
the  Gaussian  curve  is 


P.  E.  x (Normal  curve)  = =*=  .67449 


JA. 

\2  N 


Consequently,  unless  the  skewness  x has  a value  at  least  four 
times  as  large  as  this  probable  error,  it  cannot  safely  be  asserted 
that  the  curve  significantly  departs  from  the  symmetric  Gaussian 
condition.  The  probable  error  of  the  skewness  in  the  general  case 
may  be  calculated  directly  from  tables  given  in  Pearson’s  “Tables 
for  Statisticians  and  Biometricians.” 

For  the  pulse-rate  example  we  have 

.616768  X 6.469916  3.990437  , ,„on 

^ 2 (17.349580  - 2.282418  - 9)  12.134324  ^ 
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The  probable  error  of  the  skewness  for  the  normal  curve  of  the 
same  area  is 

P.  E.  x (Normal  curve)  = =*=  .0272. 

The  skewness  is,  therefore,  more  than  ten  times  as  large  as  this 
probable  error,  and  we  may  safely  conclude  that  this  curve  of 
variation  in  pulse-rate  is  significantly  skew  in  the  positive  direction. 

Kurtosis 

It  was  shown  by  Pearson5  that  an  important  shape  characteristic 
of  variation  curves  is  the  relative  degree  of  flatness  (or  peakedness) 
in  the  region  about  the  mode,  as  compared  to  the  condition  found 
in  a normal  curve.  To  this  attribute  of  the  curve  he  gave  the  name 
kurtosis.  A curve  is  said  to  be  platykurtic  when  it  is  more  flat- 
topped  (less  peaked)  than  the  Gaussian  curve.  It  is  said  to  be 
leptokurtic  when  it  is  less  flat  topped  (more  peaked) . The  Gaussian 
curve  is  mesokurtic.  If  77  denotes  kurtosis,  then 

v = 02  — 3. 


If  77  is  positive  (i.  e.,  02  > 3)  the  curve  is  leptokurtic.  If  77  is  nega- 
tive (02  < 3)  the  curve  is  platykurtic.  In  the  normal  or  Gaussian 
curve  02  = 3 with  a probable  error. 


P.  E.  (normal  curve)  = =*=  .67449 


An  illustration  of  a leptokurtic  curve  is  given  in  Fig.  85  in 
order  that  the  reader  may  grasp  what  is  meant  by  the  kurtosis  of  a 
curve. 

For  our  pulse-rate  example  we  have: 

V = 3.469916  - 3 = + .4699. 

The  probable  error  for  a normal  curve  with  924  observations  is 

P.  E.  /?2  = ± .1087. 


The  kurtosis  is,  then,  in  this  case  more  than  four  times  the 
probable  error,  and  the  curve  of  pulse-rate  variation  may  be  re^ 
garded  as  significantly  leptokurtic. 

We  have  now  determined  the  chief  physical  constants  which 
describe  variation.  If  it  is  desired  to  proceed  further  with  the 
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mathematical  analysis  what  remains  to  be  done  is  to  fit  a theoretic 
curve  to  the  observed  distribution,  and  calculate  the  ordinates  of 
this  curve.  The  methods  for  doing  this  are  given  in  detail  in  Elder- 


STATURE  IN  INCHES  - CGIS  FEMALES 

Fig.  85. — Histogram  and  fitted  curves  for  variation  in  stature  of  3915  Scottish 
females  (insane).  The  solid  curve  is  the  skew  curve  appropriate  to  the  distribution. 
The  broken  curve  is  the  corresponding  normal  or  Gaussian  curve.  The  skew  curve 
is  leptokurtic.  (Plotted  from  data  of  Tocher,  Biometrika,  5,  pp.  298-350.) 

ton’s  “Frequency  Curves  and  Correlation.”  Here  space  is  lacking 
to  go  further  into  this  phase  of  the  matter. 

THE  FREQUENCY  CONSTANTS  OF  A VARIABLE  z =f(x i,  x2)* 

It  often  happens  in  practical  biometric  work  that  one  desires 

to  find  the  frequency  constants  of  a compound  character,  from  a 

previous  knowledge  of  the  constants  of  the  separate  components. 

* Cf.  Pearl,  R.:  Biometrika,  vol.  6,  pp.  437,  438,  1909;  Reed,  L.  J.:  Jour.  Wash- 
ington Acad.  Sci.,  vol.  11,  pp.  449-455,  1921. 
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Thus,  for  example,  one  measures  the  length,  the  breadth,  and  the 
height  of  each  of  a series  of  skulls.  He  wishes  to  know  at  least  the 
mean  and  the  standard  deviation  of  the  diametral  product  (L  X 
B X H).  There  are  two  ways  open  to  find  the  values  of  these  con- 
stants. On  the  one  hand,  the  length,  breadth,  and  height  may  be 
multiplied  together  for  each  individual  skull,  a frequency  distribu- 
tion of  the  products  made,  and  the  constants  calculated  in  the 
ordinary  way;  or,  on  the  other  hand,  by  the  use  of  the  appropriate 
formulae  one  can  deduce  straight  off  the  constants  for  the  product 
knowing  those  for  the  components  which  enter  into  the  product. 
The  latter  procedure  will  obviously  effect  a great  saving  of  labor. 

The  formulae  for  determining  the  mean  and  standard  deviation 
of  a character  2 = f (xi,  x2)  when  the  same  constants  and  the 
coefficient  of  correlation  for  Xi  and  x2  are  known,  are  well  known 
to  mathematicians.  They  are  not  so  familiar  to  many  of  those 
who  have  approached  the  field  of  biometry  along  the  biologic 
pathway. 

The  general  method  of  deducing  these  formulas  will  be  clear  to 
anyone  who  will  carefully  study  Pearson’s  paper  “On  a Form  of 
Spurious  Correlation  which  may  arise  when  Indices  are  used  in 

X\ 

the  Measurement  of  Organs,”*  wherein  the  formulas  for  z = — 

x2 

are  discussed.  The  general  formulas  for  z = f (x,  y)  will  also  be 
found  discussed  in  the  Phil.  Trans.,  vol.  187a,  p.  278,  1896,  and 
by  Reed  (loc.  cit .). 

In  the  formulae  given  in  Table  61  the  various  letters  have  the 
following  meanings: 

Xi,  x2,  and  rr3  the  separate  characters  involved  in  the  compound 
character  2. 

mi,  m2,  and  w3  the  means  of  the  characters  Xi,  x2,  and  x3. 

(j  1,  cr2,  and  cr3  the  standard  deviations  of  Xi,  x2,  and  xz. 
cr  1 <t2  a 3 

Vi  = ■ — , v2  = — , Vz  = — . (The  Xs  are  the  ordinary  coefficients 
mi  m2  tm 

of  variation  divided  by  100.) 

r denotes  the  coefficient  of  correlation  (see  next  chapter)  between 
the  two  characters  designated  by  the  subscripts. 

* Proc.  Roy.  Soc.,  vol.  60,  pp.  489-498,  1897. 
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The  table  gives  the  formulae  for  the  mean  and  standard  devia- 
tion of 

(a)  the  sum  of  two  and  three  variables, 

(b)  the  difference  of  two  variables, 

(c)  the  product  of  two  and  of  three  variables, 

(d)  the  quotient  of  two  variables  (index). 

In  certain  of  the  cases  the  formulae  are  approximations,  but  very 
close  ones.  The  nature  of  the  approximations  made  is  indicated 
in  the  table. 

TABLE  61 


Constants  of  z = f {xi,  x%). 


2 =/  (xi,  X2)  . 

Mean  of  z. 

Standard  deviation  of  s. 

S = Xi~\~X2 

W1+W2 

y/  ((7i2+rT22  + 2r  12^1^2) 

Z = X 1 + X2  + X3 

W1+W2+W3 

\,/(cri2T(722“i“0’32T2f  i2rri'72-h2f  13^1^3— b 2y  23^2^3) 

Z = X 1 — X-2 

mi—m2 ' 

\/  (^i2+^22  — 2r  12^1^2) 

Z = X 1 . x2 

mi  W2  + ^'l2(7l0'2 

mim2mz[l-\-ri2Viv2-{- 

7?iim2[vi2JrV22Jr2ri2ViV2JrVih'22(.l-\-r\2)]f* 
or  approximately 
mim2\ vi2 + v22 + 2r  12^1^2]  i 

Z = Xi  . X2  . £3 

mim2mz\vi2 -f- v2 2 + v32 + 2rnV  j v2 + 2ruV\V3 

ns»i»s+»'2a®2»s] 

+ 2^23^3]  i approximately 

Xi 

z=  - 
X2 

— (l+z>22  — ruViVz) 
m 2 

m 1 , 

W2V  (,vi2+v22-2r  12V1V2) 

* This  formula,  due  to  J.  F.  Tocher,  depends  on  the  assumption  of  normal  correla- 
tion, see  Biometrika,  vol.  iv,  p.  320.  The  approximate  value  depends  on  neglecting 
higher  powers  of  the  coefficients  of  variation.  The  formula  for  the  mean  of  the  double 
product  (Tocher,  loc.  cit.)  is  exact.  The  formula  for  the  mean  of  the  triple  product  is 
not  exact,  any  more  than  the  formula  for  the  s.  d.  of  the  triple  product  (see  Tocher, 
loc.  cit.,  p.  321).  The  formulae  for  the  mean  and  s.  d.  of  an  index  are  only  true  to  the 
lowest  powers  in  V\  and  v2,  and  must  not  be  applied  if  v\  and  v2  are  large.  The  formulae 
for  z = xi±X2,  the  sums  or  differences  of  any  number  of  variables,  are  exact  for  both 
mean  and  s.  d. 


CLASS  LIMITS 

A practical  question  which  frequently  arises  to  vex  the  beginning 
statistician  in  making  tables,  for  the  purpose  of  computing  varia- 
tion constants,  is  as  to  how  fine  the  grouping  shall  be  in  a table 
based  upon  a linear  classification.  Or,  to  put  it  in  another  way, 
shall  the  class  limits  be  narrow  or  broad?  The  only  general  state- 
ment which  can  be  made  on  this  point  is  this:  The  degree  of 
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fineness  of  grouping  which  is  permissible  depends  upon  the  total 
magnitude  of  the  experience.  It  is  idle  to  expand  a small  observed 
universe  into  fine  categories,  leaving  many  cells  with  no  frequency 
or  a frequency  of  only  1.  A safe  working  rule  in  setting  up  tables  of 
frequency  is:  (a)  to  arrange  the  class  limits  so  as  to  have  from  8 to  15 
classes,  depending  upon  the  absolute  magnitude  of  the  total  expe- 
rience, and  (b)  never  to  have  fewer  than  5 classes  or  more  than  20 
to  25.  Asa  matter  of  fact  the  coarseness  or  fineness  of  the  elemental 
class  units  of  grouping  makes  (within  wide  limits)  extremely  little 
difference  in  the  values  of  derived  biometric  constants. 

The  statement  is  frequently  made,  either  in  comment  or  criticism 
upon  biometric  work,  that  such  work  is  often  caused  to  take  on  an 
unwarranted  appearance  of  precision  and  exactness  by  the  keeping 
of  a larger  number  of  decimal  places  in  the  tabled  constants  than 
the  character  of  the  original  data  justifies.  The  contention  is  made 
that  under  no  circumstances  whatsoever  can  any  statistical  con- 
stant be  more  accurate  than  the  data  on  which  it  is  based.  It  is 
held  that  if  one  makes  a series  of  measurements  accurate  to  a tenth 
of  a millimeter,  it  is  a logical  absurdity  to  table  the  mean  and 
standard  deviation  deduced  from  these  measurements  to  hundredths 
of  a millimeter.  Not  only  is  this  contention  made  from  time  to 
time  by  biologists,  but  occasionally  even  by  a mathematician  who 
ought  to  know  better,  a fact  which,  of  course,  tends  strongly  to 
confirm  the  biologist  in  his  opinion. 

The  reply  which  the  statistician  makes  to  the  criticism  that 
constants  cannot  be  more  accurate  than  the  data  on  which  they 
are  based  is,  in  general  terms,  that  the  accuracy  of  a statistical 
constant  depends  not  alone  on  the  accuracy  of  the  original  measure- 
ments but  also  upon  the  number  of  such  measurements.  Further, 
it  is  pointed  out  that,  because  of  this  fact,  it  is  possible  to  deduce 
from  measurements  known  to  be  individually  inaccurate  constants 
of  a high  degree  of  accuracy,  provided  that  the  errors  in  the  measure- 
ments are  unbiased  (that  is,  as  often  in  excess  as  in  defect  of  the 
true  value)  and  that  there  are  enough  of  the  data.  Finally  the 
statistician  contends  that  the  only  proper  measure  of  the  accuracy 
of  a statistical  constant  (always  assuming  that  the  original  data 
are  not  collected  in  a deliberately  dishonest  or  biased  manner)  is 
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its  “probable  error.”  Unfortunately  this  statement  of  the  case 
appears  not  to  carry  conviction  to  the  non-statistical  worker.  It 
has  seemed  to  the  writer  that  if  the  assertion  made  by  the  statis- 
tician regarding  the  point  under  discussion  is  true,  it  ought  to  be 
possible  to  demonstrate  it  in  such  a manner  as  to  carry  conviction 
to  anybody. 

With  this  object  in  view  the  experiment  to  be  described  was 
tried.10  Some  time  ago  the  writer  measured  for  another  purpose 
the  lengths  of  450  hens’  eggs.  The  measurements  were  made  with 
a large  steel  micrometer  caliper  manufactured  by  Browne-Sharpe 
& Co.,  reading  directly  to  hundredths  of  a millimeter.  The  utmost 
care  was  exercised  in  the  making  of  the  measurements;  they  were 
all  made  under  the  same  conditions  as  to  light,  temperature,  etc.; 
the  caliper  was  held  in  a specially  constructed  stand  to  get  rid  of 
the  error  arising  from  expansion  and  contraction  if  it  is  held  in  the 
hand;  the  micrometer  screwhead  was  fitted  with  a ratchet  which 
mechanically  insures  that  the  same  pressure  shall  be  exerted  on  the 
object  in  every  case;  all  measurements  were  made  by  the  same 
observer  who  had  had  considerable  experience  in  close  micrometer 
measuring.  The  maximum  length  was  the  thing  measured.  There 
is  every  reason  to  believe  that  these  measurements  to  hundredths 
of  a millimeter  are  as  accurate  as  it  is  possib]e  to  make  them  with 
the  instrument  used.  This  being  the  case  all  will  agree  that  any 
statistical  constant  deduced  from  them  can  be  held  to  be  accurate 
to  hundredths  of  a millimeter  at  least.  Now  let  it  be  supposed 
that  these  eggs  had  been  measured  only  to  the  nearest  millimeter 
instead  of  the  nearest  hundredth  of  a millimeter.  By  how  much 
would  the  statistical  constants  deduced  from  the  “millimeter”  data 
differ  from  those  deduced  from  the  “hundredth  millimeter  data”? 

It  will  be  recognized  that  the  problem  involved  in  this  question 
is  identical  with  that  of  the  influence  of  fineness  of  grouping  in 
statistical  series  upon  the  values  of  derived  constants. 

To  answer  this  question  it  is  necessary  to  calculate  some  statis- 
tical constant  for  the  two  sets  of  data.  The  mean  was  chosen  as 
the  simplest  possible  constant.  The  actual  measurements  to 
hundredths  of  a millimeter  were  used  as  one  set  of  data.  The 
“millimeter”  data  were  obtained  by  discarding  the  decimals  of  the 


364 


MEDICAL  BIOMETRY  AND  STATISTICS 


original  measurements.  In  this  discarding  a record  was  raised 
1 mm.  whenever  the  decimal  portion  of  the  original  figure  was  .51 
or  greater.  When  the  decimal  part  of  the  record  was  .49  or  less 
the  integral  part  stood  unchanged.  In  the  450  measurements 
there  were  6 cases  in  which  the  decimal  portion  of  the  record  was 
exactly  .50.  In  one-half  of  these  cases  the  record  was  raised  1 mm. 
and  in  the  other  half  was  left  unchanged,  when  the  decimals  were 
discarded.  This  is  obviously  the  only  fair  way  of  dealing  with 
such  cases  since,  for  example,  51.50  is  exactly  as  near  to  51  as  to  52. 

The  original  measurements  and  the  “millimeter”  data  after 
discarding  the  decimals  were  then  each  added  and  re-added  with  a 
calculating  machine.  The  resulting  sums  were: 

When  the  measurements  were  kept  to  When  the  mensurements  were  kept  to 

the  nearest  hundredth  of  a mm.  the  nearest  whole  mm. 

25,341.95  25,346 

Dividing  each  of  these  figures  by  the  total  number  of  cases, 
450,  we  get  for  the  means  the  following: 

Mean  from  “hundredth  mm.  data”  Mean  from  “millimeter  data” 

56.3154  56.3244 

The  difference  between  these  two  figures  is  .009.  That  is,  there 
is  no  difference  between  the  two  averages  until  the  third  decimal 
place  is  reached.  To  two  places  of  figures  both  means  are  56.32. 
But  this  can  only  mean  that  the  mean  or  average  obtained  when 
the  records  are  made  only  to  the  nearest  millimeter  is  more 
accurate,  by  two  places  of  decimals,  than  the  data  on  which  it  is 
based. 

In  interpreting  this  statement  of  fact  it  must  not  be  held  to 
signify  that  biometric  measurements  should  not  be  made  with  the 
greatest  attainable  degree  of  accuracy.  Because  statistical  con- 
stants, when  the  number  of  cases  dealt  with  is  large,  are  more 
accurate  than  the  data  on  which  they  are  based  gives  no  excuse  for 
rough  measuring.  The  reason  for  this,  of  course,  lies  in  the  princi- 
ple which  actual  experience  shows  to  be  correct,  that  the  finer  and 
more  accurate  the  measuring,  the  less  chance  of  the  data  being 
unconsciously  biased.  Statistical  constants  can  only  be  more 
accurate  than  the  original  data  when  the  data  are  strictly  unbiased. 
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The  “applied  psychology”  of  practical  measuring  teaches  that 
unconscious  bias  goes  out  of  the  records  just  in  proportion  as  the 
measurements  are  made  finer. 
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CHAPTER  XIV 


THE  MEASUREMENT  OF  CORRELATION 

A phase  of  biometric  technic  which  is  of  the  highest  importance 
and  usefulness  is  that  of  correlation  in  variation.  By  the  use  of  this 
technic  complicated  problems,  which  could  be  efficiently  attacked  in 
no  other  way,  may  be  solved.  Pearson  defines  correlation  in  the  fol- 
lowing terms : “Two  organs  in  the  same  individual,  or  in  a connected 
pair  of  individuals,  are  said  to  be  correlated  when,  a series  of  the 
first  organ  of  a definite  size  being  selected,  the  mean  of  the  sizes 
of  the  corresponding  second  organs  is  found  to  be  a function  of  the 
size  of  the  selected  first  organ.  If  the  mean  is  independent  of  this 
size,  the  organs  are  said  to  be  non-correlated.  Correlation  is 
defined  mathematically  by  any  constant,  or  series  of  constants, 
which  determine  the  above  function.” 

This  definition  will  be  more  intelligible  if  we  look  at  the  matter 
a little  from  the  standpoint  of  probability. 

THE  GENESIS  OF  CORRELATION 

Suppose  we  carry  out  some  experiments  in  tossing  12  pennies 
together,  in  this  manner;  make  a first  toss  and  record  the  number 
of  heads,  then  pick  up  the  pennies  and  make  a second  toss.  Then 
enter  the  results  of  both  tosses  in  a double  entry  table.  Thus  if 
on  the  first  toss  there  fell  7 heads  and  on  the  second  toss  5 heads, 
these  would  be  entered  a frequency  of  1 in  the  cell  of  Table  62 
where  the  7 column  (first  toss)  crosses  the  5 row  (second  toss). 
Continue  this  process  till  500  pairs  of  throws  have  been  made. 
The  result  will  be  similar  to  that  exhibited  in  liable  62.* 

* This  and  the  following  similar  tables  are  taken  from  Darbishire.1  His  experi- 
ments were  actually  made  with  dice,  but  the  method  of  recording  was  such  as  to 
make  them  precisely  equivalent  to  penny-tossing,  and  they  are  capable  of  more 
simple  statement  in  the  latter  form. 
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Now,  plainly,  any  particular  number  of  heads  in  the  second  toss 
is  in  this  table  associated  with  any  given  number  in  the  first  toss 
only  about  as  frequently  as  would  be  expected  from  the  proportion 
of  that  number  of  heads  in  the  whole  experience  of  first  tosses. 
In  other  words,  the  distribution  of  second  toss  heads  is  about 
random  relative  to  first  toss  heads.  This  is  what  would  be  expected 
a priori  because  there  is  no  way  in  which  the  result  of  the  first  toss 

TABLE  62 

Relation  Between  the  Number  of  Heads  Falling  in  Successive  Random 

Tosses  of  12  Pennies  Together 


Beads  in  first  toss. 


0 

— 

1 

2 

3 

1, 

5 

6 

7 

8 

9 

10 

11 

12 

Total  » 

0 

1 

1 

1 

2 

2 

1 

I, 

1 

6 

3 

1 

1,  , 

7 

8 

5 

ll 

1 

1 

31 

U 

k 

u 

7 

9 

6 

12 

5 

5 

52 

5 

3 

5 

13 

26 

lit 

Hi 

12 

6 

1 

1 

95 

6 

1 

6 

15 

25 

21* 

39 

15 

6 

2 

1 

123 

7 

1 

5 

7 

16 

22 

15 

13 

6 

1 

1 

87 

8 

1 

7 

15 

19 

12 

6 

6 

66 

9 

1 

1 

2 

9 

7 

6 

6 

1 

33 

10 

2 

1 

2 

5 

11 

12 

Total a 

1 

9 

as, 

57 

112 

101 

9h 

62 

31 

6 

2 

1 

500 

r (calc.)  = +.055  =*=  .030  r (theory)  = 0 

can  affect  the  result  of  the  second.  The  two  tosses  are  independent 
random  events.  Therefore  their  results  cannot  show  any  sensible 
quantitative  association  or  correlation  with  each  other. 

But  now  suppose  matters  to  be  arranged  so  that  the  result  of  the 
first  toss  can  influence  the  result  of  the  second.  This  can  easily 
be  done  by  marking  one  of  the  pennies  so  that  it  can  always  be 
recognized,  and  then  after  the  first  throw  leaving  this  marked  penny 
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on  the  table  while  the  remaining  1 1 pennies  are  picked  up  and  tossed 
at  random  in  order  to  give,  together  with  the  marked  penny  left 
undisturbed , the  second  toss.  The  consequence  of  this  procedure 
will  be  that  one  penny,  the  marked  one,  contributes  the  same 
element  (head  or  tail  as  the  case  may  be)  to  both  tosses.  The 
general  result  of  proceeding  in  this  way  is  shown  in  Table  63. 

TABLE  63 

Heads  in  Successive  Tosses  Where  11  Pennies  Are  Tossed  in  the  Second 
Throw  and  1 Remains  as  it  Fell  in  the  First  Throw  of  12  Together 

Heads  in  first  toss* 


0 

1 

2 

3 

It 

5 

6 

7 

8 

9 

10 

11 

12 

Itotals 

0 

\ 

1 

2 

2 

3 

2 

1 

8 

3 

1 

1 

It 

It 

5 

8 

It 

2 

29 

1 

2 

3 

10 

11 

6 

8 

5 

2 

1*8 

5 

1 

9 

li 

13 

15 

22 

11 

6 

88 

6 

2 

5 

8 

ItO 

25 

32 

8 

7 

1 

1 

129 

7 

7 

8 

13 

lit 

lit 

lit 

9 

3 

82 

8 

1 

2 

7 

9 

12 

10 

13 

2 

2 

1 

59 

9 

5 

10 

8 

12 

7 

5 

1*7 

10 

1 

1 

3 

1 

2 

8 

11 

1 

1 

2 

\ 

To  tale 

3 

3 

26 

67 

102 

9lt 

109 

67 

39 

8 

2 

500 

r (calc.)  = +.073  .030  r (theory)  = .083 


This  Table  63  is,  in  theory,  not  quite  like  Table  62,  although  to 
the  eye  it  still  is  very  similar. 

If  the  process  be  now  continued,  leaving  down  successively 
more  and  more  of  the  pennies  and  having  them  pass  over  undis- 
turbed from  first  to  second  toss,  we  shall  get  the  results  shown  in  the 
tables  which  follow.  Table  64  shows  the  result  of  marking  2 
pennies  and  leaving  them  down;  Table  65,  of  marking  3 pennies 
and  leaving  them  down,  and  so  on  up  to  all  12  pennies. 


TABLE  64 

Heads  in  Successive  Tosses  Where  10  Pennies  Are  Tossed  in  the  Second 
Throw  and  2 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Heads  in  first  toss* 


r (calc.)  = +.194  =*=  .029  r (theory)  = .167 

TABLE  65 

Heads  in  Successive  Tosses  Where  9 Pennies  Are  Tossed  in  the  Second  Throw 
and  3 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Hasae  in  flret  toaa. 


24 


r (calc.)  = +.278  ± .028 

369 


r (theory)  = .250 


TABLE  66 

Heads  in  Successive  Tosses  Where  8 Pennies  Are  Tossed  in  the  Second  Throw 
and  4 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Heads  in  first  toss. 


r (calc.)  = +.342  ± .026  r (theory)  = .333 

TABLE  67 

Heads  in  Successive  Tosses  Where  7 Pennies  Are  Tossed  in  the  Second  Thkow 
and  5 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Hssds  in  first  toss. 


r (calc.)  = +.432  =±=  .025  r (theory)  = .417 

370 


TABLE  68 

Heads  in  Successive  Tosses  Where  6 Pennies  Are  Tossed  in  the  Second  Throw 
and  6 Remain  as  They  Fell  in  the  First  Throw  oe  12  Together 


Heads  in  first  toss. 


0 

1 

2 

3 

1) 

5 

6 

7 

8 

9 

10 

11 

12 

Total: 

0 

1 

1 

1 

1 

3 

2 

1 

2 

3 

2 

8 

3 

2 

3 

5 

6 

2 

6 

21) 

1* 

5 

9 

8 

ll 

16 

7 

6 

1 

\ 

63 

5 

2 

5 

17 

2l) 

19 

25 

11 

2 

\ 

105 

6 

1 

5 

iU 

25 

2l) 

21) 

17 

1) 

3 

H7 

7 

2 

2 

13 

l6 

27 

12 

1) 

2 

78 

8 

2 

7 

13 

22 

ll) 

5 

3 

66 

9 

3 

5 

6 

9 

5 

2 

30 

10 

\ 

2 

1 

2 

5 

11 

\ 

1 

1 

12 

\ 

[totals 

12 

25 

51 

92 

97 

119 

71 

23 

10 

500 

r (calc.)  = +-449  =*=  .024  r (theory)  = .500 

TABLE  69 

Heads  in  Successive  Tosses  Where  5 Pennies  Are  Tossed  in  the  Second  Throw 
and  7 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Heads  in  first  toss. 


r (calc.)  = +.578  =±=  .020  r (theory)  = .583 
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TABLE  70 

Heads  in  Successive  Tosses  Where  4 Pennies  Are  Tossed  in  the  Second  Throw 
and  8 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


ueads  in  first  toss* 


r (calc.)  = +.676  ± .016  r (theory)  = .667 
TABLE  71 

Heads  in  Successive  Tosses  Where  3 Pennies  Are  Tossed  in  the  Second  Throw 
and  9 Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Heads  in  first  toss. 


r (calc.)  = +.765  ± .012 

372 


r (theory)  = .750 


TABLE  72 

Heads  in  Successive  Tosses  Where  2 Pennies  Are  Tossed  in  the  Second  Throw 
and  10  Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Heads  in  first  toss* 


0 

1 

2 

3 

1* 

5 

6 

7 

8 

9 

10 

11 

12 

Total 

0 

1 

\ 

1 

1 

1 

1 

2 

2 

5 

\ 

7 

3 

\ 

1 

3 

8 

9 

3 

\ 

21* 

1» 

2 

10 

18 

19 

6 

55 

5 

\ 

1 

2L 

1»3 

32 

10 

110 

6 

h 

22 

37 

2il 

6 

93 

7 

6 

27 

39 

19 

5 

\ 

96 

8 

\ 

9 

17 

2I4 

9 

1 

60 

9 

\ 

10 

Ut 

11 

7 

1*2 

10 

1 

6 

2 

1 

10 

11 

1 

1 

12 

Totals 

1 

2 

7 

2l* 

55 

93 

111 

100 

& 

31 

11 

1 

500 

r (calc.)  = +.840  =*=  .009  r (theory)  = .833 

TABLE  73 

Heads  in  Successive  Tosses  Where  1 Penny  is  Tossed  in  the  Second  Throw 
and  11  Remain  as  They  Fell  in  the  First  Throw  of  12  Together 


Heads  in  first  toss. 


0 

1 

2 

3 

i. 

5 

6 

7 

8 

9 

10 

11 

12 

Tbtal  £ 

0 

\ 

1 

2 

\ 

5 

3 

\ 

8 

3 

\ 

2 

7 

10 

\ 

19 

I* 

7 

21 

23 

\. 

\ 

51 

5 

\ 

19 

w* 

33 

\ 

96 

6 

22 

56 

30 

108 

7 

\ 

31 

b9 

16 

\ 

96 

8 

25 

38 

13 

\ 

7b 

9 

15 

18 

5 

38 

10 

1 

5 

1 

7 

11 

\ 

1 

1 

12 

Totals 

7 

17 

50 

89 

120 

10l* 

t9 

32 

11 

1 

500 

r (calc.)  = +.910  =*=  .005  r (theory)  = .917 
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TABLE  74 

Heads  in  Successive  Tosses  Where  No  Penny  is  Tossed  in  the  Second  Throw 
and  12  Remain  as  They  Fell  in  the  First  Throw  or  12  Together 


Heads  in  first  toss. 


r (calc.)  = 1 r (theory)  = 1 


In  this  series  of  tables  is  seen  the  genesis  of  correlation.  In 
Table  62  the  results  of  the  first  toss  have  no  influence  on  the  results 
of  the  second.  There  is  no  correlation  between  them.  In  Table 
74  the  results  of  the  first  toss  completely  determine,  or  cause , the 
results  of  the  second.  This  gives  perfect  correlation- — or,  in  this 
particular  case,  causation — between  the  two. 

In  all  the  tables  the  diagonal  lines  cut  off  the  cells  in  which 
events  cannot  possibly  happen. 

Just  below  each  of  these  tables  there  have  been  placed  two 
values  of  r (the  coefficient  of  correlation,  which  is  discussed  in 
detail  in  a later  section).  The  first  one  of  these  is  the  value  cal- 
culated directly  from  the  table  itself.  The  other  is  the  theoretical 
value  which  is  a consequence  of  the  number  of  pennies  left  down 
from  the  first  toss,  according  to  the  theory  of  probability.* 

* Rietz,  H.  L.:  Urn  Schemata  as  a Basis  for  the  Development  of  Correlation 
Theory,  Annals  of  Math.,  vol.  21,  pp.  306-322,  1920. 
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THE  CORRELATION  TABLE  AND  REGRESSION 

Suppose  one  wished  an  answer  to  this  question:  What  quan- 
titative relation,  if  any,  exists  between  brain  weight  and  skull 
length?  One  knows  from  general  anatomic  relations  that  there 
must  be  some  association  between  these  phenomena.  A long  head 
and  a heavy  brain  are  often  observed  together  in  the  same  individual. 
But  in  a statistical  sense,  how  close  is  this  association  in  general? 
What  is  its  quantitative  degree  of  intensity? 

Quite  obviously  the  way  to  start  getting  an  answer  to  this 
question  is  to  collect  information,  on  as  many  persons  as  possible, 
as  to  the  brain  weight  and  the  skull  length  in  the  same  individual. 
Having  this  information,  one  may  set  up  a table  like  Table  75. 
This  table  is  taken  from  a paper  by  the  present  writer,*  the  original 
data  having  been  collected  by  Matiegka.f 

TABLE  75 

Correlation  Between  Brain-weight  and  Skull  Length.  Bohemian  Males, 

Twenty  to  Fiety-nine  Years  of  Age 


U-4 

O 

Brain-weight  (grams). 

V,  MrX 

g >> 

c/d  tuO  tuO 

G 

oj 

*o  ^ . 

M-l 

o <3 

C/5  4-> 

1000- 

1100- 

1200- 

1300- 

1400- 

1500- 

1600- 

1700- 

1800- 

O. 

,±7  C/5 

C -C 

cj  bC 

1099 

1199 

1299 

1399 

1499 

1599 

1699 

1799 

1899 

o 

H 

-—•  -M 

S w " 

E 5 

S & 

' v 

155-159. . . . 

1 

1 

2 

157.5 

1300 

s 

160-164. . . . 

2 

6 

4 

2 

14 

162.5 

1393 

E 

165-169. . . . 

1 

9 

10 

18 

3 

1 

42 

167.5 

1386 

170-174. . . . 

5 

19 

28 

11 

4 

1 

68 

172.5 

1440 

•+-> 

be 

q 

175-179. . . . 

4 

19 

29 

23 

4 

79 

177.5 

1455 

180-184. . . . 

10 

19 

23 

8 

1 

61 

182.5 

1502 

185-189. . . . 

% , 

1 

2 

12 

4 

19 

187.5 

1550 

3 

190-194. . . . 

. , 

1 

2 

3 

4 

10 

192.5 

1650 

195-199.... 

1 

1 

2 

4 

197.5 

1725 

Totals.  . 

1 

21 

66 

101 

77 

25 

6 

2 

299 

Midpoints 

of  class 
ranges  of 
brain- 
weight.  . . 

1050 

1150 

1250 

1350 

1450 

1550 

1650 

1750 

1850 

* 

Means  of 

skull 
length 
arrays. . . . 

167.5 

169.6 

173.8 

175.0 

179.7 

182.1 

187.5 

197.5 

* Pearl  R.:  Biometrical  Studies  on  Man.  I.  Variation  and  Correlation  in 
Brain-weight,  Biometrika,  vol.  4,  pp.  13-104,  1905. 

t Matiegka,  H. : Uber  das  Hirngewicht,  die  Schadelkapacitat  und  die  Kopfform. 
Sitzber.  des  kon.  bohmischen  Gesellsch.  d.  Wiss.,  Math. -Nat.  Cl.,  Jahrg.,  1902,  No. 
xx,  pp.  1-75. 
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A table  of  this  sort  is  known  as  a correlation  table.  It  is  a table 
of  double  entry,  which  enables  one  to  read  off,  for  example,  that 
there  were  in  the  total  experience  18  persons  who  had  a brain- 
weight  of  1400-1499  grams,  and  a skull  length  of  165-169  mm. 
It  is  made  up  of  a series  of  rows  and  columns,  each  of  which  is,  of 
itself,  a frequency  distribution.  Each  row  and  each  column  is 
called  technically  an  array.  Thus  there  is  an  array  of  skull  lengths 
(a  column)  associated  with  a midrange  brain-weight  of  1450,  and 
similarly  there  is  an  array  of  brain-weights  (a  row)  associated  with 
a skull  length  of  172.5,  and  so  on. 

Geometrically  the  table  may  be  represented  best  as  a surface. 
Call  brain-weight  the  x coordinate,  and  skull  length  the  y coordinate. 
Then  the  frequencies  in  each  cell  must  be  represented  by  the 
volumes  (instead  of  areas  as  in  simple  frequency  distributions)  of 
rectangular  solids  with  one  end  of  each  one  covering  the  cell  on 
which  it  stands,  and  their  heights  reading  on  the  z coordinate. 
Now  suppose  the  tops  of  these  cells  to  be  connected  with  each  other 
and  covered  by  a smooth  surface.  The  general  shape  of  the  re- 
sulting surface  will  usually  be  quite  strikingly  similar  to  that  of  the 
“tin  hats”  worn  by  the  United  States  soldiers  in  the  late  war. 

Each  array  may  be  treated  biometrically  as  an  independent 
frequency  distribution,  and  the  mean,  standard  deviation,  etc.., 
determined.  The  first  step  in  this  direction  leads  to  the  array 
means  given  on  the  margins  of  Table  75.  These  array  means, 
taken  in  connection  with  the  midpoints  of  the  class  ranges  of  the 
other  variable  set  next  to  them,  at  once  bring  out  an  interesting 
point.  It  is  that  as  the  midpoints  of  the  brain-weight  class  range 
(let  us  say)  increase  as  we  pass  from  left  to  right,  there  is  a slightly 
irregular  but  still  perfectly  definite  tendency  for  the  means  of  the 
corresponding  skull  length  arrays  to  increase. 

This  fact  can  be  made  more  apparent  graphically  as  seen  in 
Fig.  86. 

The  lines  formed  by  plotting  the  means  of  the  arrays  are 
called  observed  regression  lines , regression  being  a term  intro- 
duced into  statistical  usage  by  Galton.  The  manner  in  which 
the  calculated  regression  lines  are  derived  will  be  explained  in  the 
next  section. 
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It  is  apparent  from  Fig.  86  that  the  slope  of  the  regression  lines 
gives  a means  of  measuring  the  degree  of  correlation  or  association 
of  variation  between  the  variables.  For  suppose  AB  to  be  rotated 
about  0 as  an  axis  until  it  exactly  coincided  with  YY,  and  CD  to  be 
rotated  about  0 until  it  exactly  coincided  with  XX.  Then  there 


1000  1 100  !Z00  1300  1400  1500  1600  1700  1600  1900 
Brain  Weight  (grams) 


Fig.  86. — Observed  and  calculated  regressions  for  brain-weight  and  skull  length 
from  Table  75.  The  crosses  are  the  means  of  the  observed  skull  length  arrays  (ob- 
served regression  of  skull  length  on  brain-weight).  M2?  is  the  calculated  regression 
line  of  skull  length  on  brain-weight.  The  circles  are  the  means  of  the  observed  brain- 
weight  arrays  (observed  regression  of  brain-weight  on  skull  length).  CD  is  the  cor- 
responding calculated  regression  line.  XX  gives  the  location  on  the  brain-weight 
scale  of  the  mean  of  all  299  brain-weights.  YY  gives  the  mean  of  all  skull  lengths 
on  the  skull  length  scale. 

would  be  no  increase  in  brain-weight  associated  with  an  increase  in 
skull  length,  or  vice  versa.  Actually  the  method  used  for  measuring 
correlation,  as  will  be  shown  in  the  next  section,  does  make  use  of 
just  this  principle. 
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THE  MEASUREMENT  OF  SIMPLE  CORRELATION  WITH  LINEAR  RE- 
GRESSION. THE  CORRELATION  COEFFICIENT 

In  the  simplest  and  fundamental  case  correlation  between  two 
variables  is  measured  by  a coefficient 

S (xi  X2) 
ri2  ~ Nc*i  <t2  ’ 

where  ri2  is  the  coefficient  of  correlation  between  the  two  variables 
Xi  and  M2,  of  which  <n  and  cr2  are  the  respective  standard  deviations 
and  N is  the  number  of  pairs  of  variates.  5 denotes  summation, 
and  Xi  and  x2  are  deviations  from  the  means  of  Xx  and  X2  respec- 
tively. This  coefficient  may  take  any  value  between  0,  which  is 
the  result  when  there  is  no  correlation  at  all  between  the  variables, 
and  either  + 1 or  — 1.  When  either  of  the  latter  values  occurs 
it  means  that  the  correlation  is  perfect,  i.  e.,  for  every  change  in 
one  of  the  variables  there  is  a definite  and  constant  proportional 
change  in  the  value  of  the  other.  A positive  correlation  means 
that  as  one  variable  increases  in  value  the  other  variable  also 
increases  and  vice  versa.  A negative  correlation  means  that  as  one 
variable  increases  the  other  decreases.  The  coefficient  of  correla- 
tion has  a probable  error,  which  takes  the  following  value: 

When  N is  say  25  or  more 

l — r2 

P.  E.f  = .67449  — 

Vn 

When  it  is  desired  to  test  whether  an  observed  correlation  coefficient 
is  significantly  different  from  zero,  r in  the  above  formula  should 
be  put  = 0 in  calculating  the  P.  E.  For  very  small  numbers 
( N < 25)  special  caution  must  be  used  in  estimating  the  reliability 
of  a correlation  coefficient.  Here  the  section  in  R.  A.  Fisher’s 
“Statistical  Methods  for  Research  Workers”  on  “The  Significance 
of  an  Observed  Correlation”  (pp.  157-161)  will  be  found  helpful. 

The  method  of  calculating  the  coefficient  of  correlation  r will 
now  be  described.  The  method  here  given  is  a short  one  worked 
out  as  to  its  details  in  this  laboratory.  In  principle  it  is  the 
same  as  short  methods  which  have  been  described  by  other 
workers,  but  possesses  some  advantages  in  practical  computation 
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over  any  that  have  come  to  the  writer’s  notice.  For  a detailed 
account  of  the  arithmetic  of  the  old  direct  product-moment  method 
of  determining  a coefficient  of  correlation,  see  Yule.2 

As  an  example  we  may  take  Table  75  giving  the  correlation  be- 
tween skull  length  and  brain-weight.  This  table  is  repeated,  with 
the  arithmetic  of  the  first  steps  in  the  computations,  as  Table  76. 

First  we  may  consider  the  notation  used,  which  is  identical 
with  that  in  the  preceding  chapter  on  the  measurement  of  varia- 
tion. The  marginal  total  arrays  of  the  table  are  designated 


Zx  — frequency  in  the  several  brain-weight  classes. 
Zy  = frequency  in  the  several  skull  length  classes. 


TABLE  76 


Showing  the  Steps  in  the  Calculation  of  a Correlation  Coefficient 


i 

Brain-weight  (grams). 

Totals,  Zy. 

y 

zyy 

Zyy* 

Z%yX 

zxyxy 

1000- 

1099 

1100- 

1199 

1200- 

1299 

1300- 

1399 

1400- 

1499 

1500- 

1599 

1600- 

1699 

LOO- 

LOO 

1800- 

1899 

155-159 

1 

1 

2 

-3 

- 6 

18 

- 3 

+ 9 

160-164 

2 

6 

4 

2 

14 

-2 

- 28 

56 

- 8 

+ 16 

165-169 

i 

9 

10 

18 

3 

1 

42 

-1 

- 42 

42 

-27 

+ 27 

s 

d 

170-174 

5 

19 

28 

11 

4 

1 

68 

0 

0 

0 

- 7 

0 

c 
«■ — ' 

175-179 

4 

19 

29 

23 

4 

• • • 

79 

1 

79 

79 

+ 4 

+ 4 

43 

180-184 

• • • 

10 

19 

23 

8 

1 

61 

2 

122 

244 

+32 

+ 64 

b£) 

185-189 

. 

1 

2 

12 

4 

, 

19 

3 

57 

171 

+ 19 

+ 57 

190-194 

• • • 

1 

2 

3 

4 

10 

4 

40 

160 

+ 20 

+ 80 

3 

195-199 

1 

1 

2 

+ 

5 

20 

100 

+ 11 

+ 55 

M 

m 

Totals 

1 

21 

66 

101 

77 

25 

6 

2 

299 

+242 

870 

+41 

+312 

1 

1 

x 

|-4 

-3 

-2 

-1 

0 

1 

2 

3 

'! 



-4 

-42 

-66 

0 

77 

50 

18 

8 

+41 

Z^.  . . . 

16 

84 

66 

0 

77 

100 

54 

32 

429 

x denotes  deviations,  in  class  units  of  100  grams  each,  of  each 
brain-weight  class  from  the  arbitrary  origin  {%  = 0)  at  the  mid- 
point (1450)  of  the  brain-weight  class  1400-1499. 

y denotes  deviations  in  class  units  of  5 mm.  each,  of  each 
skull  length  class  from  its  arbitrary  origin  at  172.5  mm. 

We  need  as  the  first  step  to  get  the  means  and  standard  devia- 
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tions  for  the  two  variables.  Proceeding  just  as  in  Chapter  XIII, 
we  have: 


5 (Zxx)  41 

S(ZX)  ~ 299 
5 (Zxx2)  429 

S(ZX)  ~ 299 


.137124 

1.434783 


Omitting  Sheppard’s  corrections  for  the  sake  of  simplicity,  we 
then  have 

*2*  = 1.434783  - ( . 137124)2  = 1.415980, 

whence 

ax  — \/ Kix  — 1 • 189950  in  class  units. 

We  then  have 


Mean  brain-weight  = 1450  -f-  (100  X .1371)  = 1463.71  =*=  4.64  grams. 
St"i“h°t)  = 100  X 1-18995  = 119.00  * 3.28  grams. 


Similarly  for  skull  length  we  have: 

242 

S (Zy)  - 299 
S ( Zyy 2)  870 


5 (Zyy) 

Vly  = — 


v2y  = 


S (Zy 


299 


= .809365 
- 2.909699 


7T2y=  2.909699  - (.809365)2  = 2.254627 


<jy  — -y/2. 254627  = 1 . 501542  in  class  units. 

Mean  skull  length  = 172.5  + (.809365  X 5)  = 176.55  =± 
Standard  deviation 


(in  skull  length) 


= 5 X 1 . 501542  = 7 . 51  ± .21  mm. 


29  mm. 


Now  in  proceeding  to  get  the  coefficient  of  correlation  we  may 
first  break  it  up  into  this  form, 


02  = 


5 (xy)  _ S (xy) 
Nci  o2  ■ N 


X 


°1  °2 


and  determine  first  5 ( xy)/N . Call  N - = A,  and  cr i a2  = B. 

Suppose  the  row  designated  x at  the  bottom  of  the  table  and 
surrounded  by  a heavy  line  frame  to  be  movable.  Then  suppose 
it  to  be  moved  up  on  the  table  till  it  rests  just  under  the  first 
brain-weight  array  (the  first  frequency  row  in  the  table,  corre- 
sponding to  a skull  length  of  157.5).  Then  multiply  each  cell 
frequency  (zxy)  in  that  array  by  the  number  in  the  x row  which 


THE  MEASUREMENT  OF  CORRELATION  381 

falls  directly  under  that  cell,  having  regard  to  the  sign  of  the  x 
always.  We  shall  have 

1 x (-2)  = -2 

1 X (-1)  = 

Sum  = — 3 

This  —3  is  the  first  entry  in  the  marginal  column  to  the  right 
of  the  table  headed  zxyx. 

Now  slide  the  movable  x row  down  one  array  till  it  is  just  below 
the  brain- weight  array  corresponding  to  skull-length  162.5,  and 
repeat  the  same  process  as  before.  We  have: 

2 X (-2)  = -4 

6 X (-1)  = -6 
4X0  =0 

2 X (+1)  = _2 

Sum  = — 8 

This  — 8 is  the  second  entry  in  the  zxyx  column. 

Let  this  process  be  repeated  for  each  of  the  brain-weight 
arrays.  The  results  will  be  those  seen  in  the  zxyx  column.  When 
completed  the  algebraic  sum  of  this  is  found  to  be  +41.  This 
will  be  seen  to  agree  with  the  sum  of  the  row  at  the  bottom  of  the 
table  headed  Zxx.  This  agreement  between  these  two  sums 
must  always  be  exact,  and  furnishes  an  important  check  on  the 
correctness  of  the  work.  If  they  do  not  agree  a mistake  has  been 
made  and  one  should  proceed  no  farther  till  it  has  been  found 
and  corrected. 

Now  what  we  have  so  far  is  the  product  of  each  elemental  cell 
frequency  (zxy)  by  the  deviation  of  its  position  from  the  arbitrary 
origin  of  the  x variable.  The  next  step  is  to  multiply  in  the 
deviation  of  the  cell  from  the  arbitrary  origin  of  the  y variable. 
This  is  done  in  the  last  column  to  the  right,  headed  zxyxy. 

Thus  we  have 

(-3)  X (-  3)  = + 9 
(-2)  X (-  8)  = +16 
(-1)  X (-27)  = +27 
0 x (-  7)  = 0 

(+1)  X (+  4)  = + 4 
and  so  on. 

The  sum  of  this  column  (S  ( zxyxy ) ) is  the  product  moment  of 
the  table,  referred  to  the  arbitrarily  chosen  axes  of  origin.  We 


382 


MEDICAL  BIOMETRY  AND  STATISTICS 


need,  just  as  with  simple  frequency  distributions,  to  transfer  this 
to  the  mean  as  origin,  and  the  method  of  doing  so  is  in  principle 
just  the  same,  namely,  by  shifting  its  value  by  an  amount  equal 
to  the  product  of  the  two  first  moments  (vxx  and  viy)  about  the 
arbitrary  origin.  Remembering  that,  in  the  notation  used  above, 

S (xy)  _ A 
Tv2  ~ No  1 ” T’ 

we  have  the  rule  for  transferring  to  the  mean  that 

_ s {zXyxy)  _ 

A jsf  ix  by 

In  the  present  example 

A = S - vlXviy  = - (.137124  X .809365) 

= 1.043478  - .110983 
= + .932495 


Remembering  always  that  we  are  computing  in  terms  of  class 
units  of  grouping 

B = axa2  = 1.189950  X 1.501542  = 1.786760 

* 


Whence  finally 


+ .932495 
1.786760 


+ .522 


.028. 


The  probable  error  of  ± .028  is  the  one  to  be  used  in  comparing 
this  coefficient  + .522  with  other  observed  correlation  coefficients. 
If  one  wished  to  test  whether  + .522  is  itself  significantly  different 
from  zero  the  proper  probable  error  for  the  purpose  is  ± .039. 

While  it  has  taken  a good  deal  of  space  to  describe  this  process, 
it  is,  in  fact,  a very  simple  matter  to  calculate  a correlation  coeffi- 
cient, and  by  the  method  here  described  takes  but  a short  time. 

Let  us  consider  now  the  regression  coefficients.  These  are  two 
quantities  defined  as  follows: 


bx  = T 12  

7 ^ 

02  — r 12  — 

Ox 
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These  quantities  measure  the  slopes  of  the  regression  lines  (cf. 
Fig.  86  supra).  That  is 

x = bi  y 
y = b2  x 

Let  subscript  1 denote  the  brain-weight  or  x variable,  and 
subscript  2 denote  the  skull  length  or  y variable,  and  x denote  the 
deviation  of  the  mean  of  a brain-weight  array  from  the  mean  brain- 
weight  of  the  whole  sample,  and  y the  deviation  of  the  mean  of  a 
skull  length  array  from  the  mean  skull  length  of  the  whole  sample. 

Then  in  our  example 


Whence 


h 


.521892 


118.995 

7.508 


8.272 


x = 8.272  y. 


But  x and  y are  deviations  from  the  means  of  brain- weight  and 
skull  length  respectively.  We  shall  do  better  to  work  with  abso- 
lute values  rather  than  deviations.  Doing  so,  we  have, 

X = (X  - 1463.7) 
y = (F  - 176.5) 

So  then, 

X - 1463.7  = 8.272  (F  - 176.5). 

Simplifying,  we  get 

Mean  brain-weight  (in  grams)  = 3.7  + 8.272  skull  length  (in  mm.). 

This  is  the  equation  of  the  regression  line  CD  of  Fig.  86.  It 
expresses  the  regression  of  brain-weight  on  skull  length. 

Proceeding  in  the  same  way  for  the  regression  of  skull  length 
on  brain-weight  we  have 

‘•“'“T  - -521892  TT|-|§|  = .033. 
y = .033  x 

Y - 176.5  = .033  (X  - 1463.7) 

Mean  skull  length  (in  mm.)  = 128.2  -f-  .033  brain-weight  (in  grams). 
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This  is  the  equation  of  the  line  AB  in  Fig.  86. 

This  completes  the  essential  mathematical  treatment  of  simple 
two-variable  correlation  with  linear  regression. 

ILLUSTRATION  OF  CORRELATION  IN  HUMAN  MATERIAL 

In  order  to  give  some  idea  of  the  extent  to  which  various 
human  characteristics  are  correlated  Table  77  is  presented.  It 
gives  the  values  of  the  coefficient  of  correlation  for  a number  of 
representative  characters.  It  represents  only  a small  fraction  of 
the  large  number  of  correlations  for  human  characters  which  are 
now  known.  In  considering  the  values  in  this  table  it  must  be 
remembered,  from  principles  already  stated,  that  if  a correlation 
coefficient  is  not  4 or  more  times  its  probable  error  it  cannot  be 
asserted  to  be  certainly  different  from  zero,  though  if  it  is  3 times 
the  probable  error  it  is  probably  so. 
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TABLE  77 

Correlation  in  Man 


Correlated  Characters. 

Coefficient  of 
correlation. 

Age  (adults)  and  temperature  (oral)  1 

— . 150  ± . 022 

Age  (adults)  and  pulse  rate  1 

+ .121  ±.022 

Age  (adults)  and  respiration  rate  1 . . 

+ . 077  ± . 022 

Age  (adults)  and  body  weight  1 

+ . 136 ± .030 

Temperature  (oral)  and  pulse  rate  1 

+ . 288 ± . 020 

Temperature  (oral)  and  respiration  rate  1 

+ . 142  ±,.  022 

Temperature  (oral)  and  height  1 

+ . 003  ± . 022 

Temperature  (oral)  and  body  weight  1 

+ .043  ±.022 

Pulse  rate  and  respiration  rate  1 

+ . 060  ± . 022 

Pulse  rate  and  height  1 

- .078 ±.022 
+ . 1 1 4 ± .022 

Pulse  rate  and  body  weight  1 

Respiration  rate  and  height  1 

— . 144±  .022 

Respiration  rate  and  body  weight  1 

— .089  ±.022 

Corrected  death  rates  from  (a)  cancer  of  the  liver,  and  (b)  cancer  of  the  stomach 
(Switzerland)  2 , 

+ . 161 ± . 140 

+ .263  ±.134 

Corrected  death  rates  from  (a)  cancer  of  the  stomach,  and  ( b ) cancer  of  the 
rectum  and  intestines  (Switzerland)  2 

Occupation  and  cancer  mortality  (occupied  and  ret  ired  males,  1900-2,  weighted)  3 

+ .40  ±.06 

Weight  and  length  of  infants  at  birth  (males)  4 

+ . 644 ± . 012 

Body  weight  and  height  (adult  males)  4 

+ . 486 ± . 016 

Strength  of  pull  and  height  (adult  males)  4 

+ . 303 ± , 019 

Strength'  of  pull  and  body  weight  (adult  males)  4 

+ :545  ±.015 

Length  of  first  joint  of  forefinger  in  (a)  right  hand,  and  (6)  left  hand  5 

+ . 925  ± . 004 

Stature  in  (a)  brother  and  (6)  sister  6 

+ . 375±  .017 

Cephalic  index  in  (a)  brother,  and  (5)  sister  6 

+ .340± . 050 

Birth  rate  and  infant  death  rate  (London,  1901)  7 

+ .51  ±.10 

Birth  rate  and  poverty  rate  8 

+ . 420  ± . 047 

Infant  mortality  and  artificial  feeding  rate  8 

+ .760  ±.029 

Heart  weight  and  body  weight 9 

+ .65  ±.04 

Heart  weight  and  kidney  weight 9 

+ .56  ±.05 

Heart  weight  and  liver  weight9 

+ .52  ±.06 

Heart  weight  and  brain  weight  9 

+ .08  ±.08 

Obstetric  conjugate  and  inter-crests  diameters  of  pelvis  10 

+ .17  ±.04 

Obstetric  conjugate  and  inter-spines'  diameters  of  pelvis  10 

+ .13  +.05 

Obstetric  conjugate  and  transverse  diameters  of  pelvis  10 

+ .07  ±.05 

Obstetric  conjugate  and  diagonal  conjugate  diameters  of  pelvis  10 

+ .91  ±01 

Obstetric  conjugate  and  antero-posterior  diameters  of  pelvis  10 

+ .30  ± 04 

Duration  of  life  of  (a)  father,  and  (6)  adult  son  11 

+ 135  ±.021 

Duration  of  life  of  (a)  father  and  (b)  minor  son  11 

+ . 087  ± . 022 

Duration  of  life  of  (a)  father,  and  (b)  adult  daughter  11 

+ . 130  ±.020 

Duration  of  life  of  (a)  mother,  and  (5)  adult  son  11 

+ .131  ±.019 

Duration  of  life  of  (a)  mother,  and  (b)  adult  daughter  11 

+ .149  ±.020 

Duration  of  life  of  (a)  adult  brother  and  ( b ) adult  brother  11 

+ . 285  ± . 020 

Duration  of  life  of  (a)  adult  sister  and  ( b ) adult  sister  11 

+ .332±  .019 

Vaccination  and  recovery  from  smallpox  12 

+ .656  ±.009 

Lung  capacity  and  body  weight  (age  19,  males)  13 

+ .62  ±.02 

Number  of  decayed  teeth  and  use  of  tooth-brush  (boys)  14 

+ .074±  .030 

Mean  age  at  death  of  (a)  husband,  and  (6)  wife  15 

+ . 224 ± . 022 

1 Whiting,  M.  H.:  Biometrika  11:11,  1915-17. 

2 Brown,  J.  W.,  and  Lai,  Mohan:  J.  Hyg. 
14:192,  1914. 

3 Greenwood,  M.,  and  Wood,  Frances:  Proc. 
Roy.  Soc.  Med.  8 (Sect.  Epidemiology) : 1 19,  1914. 

4 Pearson,  Earl:  Proc.  Roy.  Soc.  Lond.  66:25, 

1899-90. 

6  Whiteley,  M.  A.,  and  Pearson, -Karl:  Proc. 
Roy.  Soc.  Lond.  65:130,  1899. 

6 Fawcett,  Cicely  D.,  and  Pearson,  Karl:  Proc. 
Roy.  Soc.  Lond.  62:415,  1898. 

7 Heron,  David : On  the  Relation  of  Fertility  in 

Man  to  Social  Status,  London,  Dulau&Co.,  1906. 


8 Greenwood,  M.:  Eugenics  Rev.  4:248,  1912— 
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9 Greenwood,  M.,  and  Brown,  J.:  Biometrika 
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rika 1 :60,  1901-02. 
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SKEW  CORRELATION  AND  NON-LINEAR  REGRESSION.  THE  CORRE- 
LATION RATIO 

So  far  we  have  dealt  only  with  two-variable  correlation  where 
the  means  of  the  arrays  fall  upon  a straight  line,  within  the  errors 
of  sampling.  It  will  be  at  once  obvious  to  any  biologist  that 
there  are  many  cases  in  nature  in  which  this  condition  is  not  at  all 
closely  approached.  An  example  is  the  correlation  between  a bodily 
characteristic  and  age  during  the  growing  period  of  the  organism; 
the  data,  in  short,  which  lead  to  a growth  curve. 

Pearson3  has  dealt  with  non-linear  regression  under  the  desig- 
nation of  skew  correlation , and  devised  a satisfactory  method  of 
measuring  the  correlation  or  association  in  such  cases.  In  the  first 
place  it  is  apparent  that  such  a constant  as 

Tl 2 = \/bl  . &2 

fails  wholly  in  such  a case  as  that  of  a growth  curve,  because  b\ 
and  b2  no  longer  have  the  simple  meaning  they  did  in  linear  re- 
gression. 

Pearson,  therefore,  proposed  a new  constant,  the  correlation 
ratio , conventionally  denoted  by  the  Greek  letter  eta  (??).  Let  us 
now  try  to  explain,  with  a minimum  of  mathematical  notation, 
just  what  this  constant  means. 

Going  back  to  Table  75  it  must  be  apparent  to  anyone  that 
each  array  of  such  a table  may  be  treated  biometrically  as  a separate 
frequency  distribution.  Thus  the  array  of  brain-weights  associated 
with  skull  lengths  170-174  mm.  is  as  follows: 


Brain-weight. 


Frequency. 


1200-1299 5 

1300-1399 19 

1400-1499 28 

1500-1599 11 

1600-1699 4 

1700-1799 1 


Total 


68 


For  this,  or  any  other  similar  array  distribution,  we  can,  if  it 
is  desired,  compute  in  the  regular  way  the  mean  and  the  standard 
deviation.  The  former  will  measure  the  type  of  the  array,  and  the 
latter  the  variability  of  the  array.  Suppose  we  calculate  in  this 
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way  the  standard  deviation,  measuring  the  variability,  of  each 
brain- weight  array  in  the  table.  We  shall  then  have  a series  of 
9 standard  deviations.  If  we  add  these  together  and  divide  by  9 
we  shall  have  as  the  result  the  unweighted  mean  variability  of 
brain-weight  arrays  associated  with  particular  skull  lengths.  If 
we  multiply  each  standard  deviation  of  an  array  by  the  total  fre- 
quency in  that  array,  add  up  the  results  and  divide  by  299,  the 
sum  of  all  the  frequencies  in  all  arrays,  the  result  will  be  the 
weighted  mean  variability  of  arrays  of  brain-weight  associated  with 
particular  skull  lengths. 

Plainly,  from  mere  inspection  of  the  table,  this  weighted  mean 
variability  of  brain-weight  arrays  will  be  smaller  than  the  varia- 
bility of  brain-weight  in  general  over  the  whole  table,  provided 
there  is  any  correlation  or  association  between  brain-weight  and 
skull  length.  One  can  see  at  once  that  no  single  row  (i.  e.}  brain- 
weight  array)  of  Table  75  shows  as  great  a scatter  or  variability, 
as  does  the  total  row  for  all  brain-weights  at  the  bottom  of  the 
table.  It  follows  that  if  no  single  row  is  as  variable  as  the  total, 
the  average  variability  of  all  single  rows  must  be  less  than  the 
variability  of  the  total. 

Suppose  now  we  define  a quantity  77  as  follows: 

°ax  = (1  - rg)  cr*2, (i) 

where  aax  is  the  weighted  mean  variability  of  the  single  arrays, 
of  which  we  have  just  been  speaking,  and  crx  is  the  total  varia- 
bility of  the  same  variable. 

Thus  rf  plainly  is  the  ratio  of  reduction  of  average  variability 
of  an  array  below  the  variability  of  the  sample  as  a whole  when 
these  variabilities  are  expressed  as  squared  standard  deviations. 
Now  one  can  see  by  studying  again  Table  62  to  74  supra  that  when 
the  correlation  or  association  between  the  two  variables  is  high  crax  is 
bound  to  be  small  as  compared  with  <jx , and  consequently  77  will 
be  large.  When,  on  the  other  hand,  the  correlation  is  low , crax 
will  be  of  the  same  order  of  magnitude  as  crx,  and  77  will  necessarily 
be  small.  Therefore  it  follows  that  77  may  be  used  as  a measure 
of  the  degree  of  correlation  existing  in  a particular  case,  quite 


388 


MEDICAL  BIOMETRY  AND  STATISTICS 


regardless  of  whether  the  regression  is  linear  or  not.  When  the 
regression  is  strictly  linear  77  will  be  equal  to  r. 

The  value  of  the  correlation  ratio  may  be  computed  in  either 
of  two  ways.  One  may  proceed  in  just  the  manner  outlined  above, 
getting  the  standard  deviation,  or  rather  the  second  moment  about 
the  mean  of  each  array,  determining  their  weighted  average,  and 
then  applying  in  equation  (i)  to  determine  rj. 

A shorter  method  is,  however,  more  commonly  used.  From 
equation  (i) 

r>x  — Gax 

V2  = 5 — 

ax 

Take  a new  quantity 

°mx  — °x  — Gax 

It  can  be  shown  that  this  quantity  crmx  is  the  standard  deviation 
of  the  means  of  arrays , and  therefore  easily  determined  because  we 
already  have  the  means  of  the  arrays  for  the  purpose  of  plotting 
regression  lines.  So  then  we  have 

°mx 
V = 


Let  us  take  as  a first  numerical  example  of  the  computation 
of  the  correlation  ratio  the  brain- weight  skull  length  case  of  Table 
75.  The  work  is  shown  in  Table  78. 


TABLE  78 

Calculation  of  Correlation  Ratio  from  Data  of  Table  75 


Skull  length  classes. 

Means  of  the 
x arrays 
(brain- 
weight). 

X 

X 2 

Zy 

ZyX2 

155-159 

1300 

-164 

26,896 

2 

53,792 

160-164 

1393 

- 71 

5,041 

14 

70,574 

165-169 

1386 

- 78 

6,084 

42 

255,528 

170-174 

1440 

- 24 

576 

68 

39,168 

175-179 

1455 

- 9 

81 

79 

6,399 

180-184 

1502 

+ 38 

1,444 

61 

88,084 

185-189 

1550 

+ 86 

7,396 

19 

140,524 

190-194 

1650 

+ 186 

34,596 

10 

345,960 

195-199 

1725 

+261 

68,121 

4 

272,484 

Totals 

299 

1,272,513 
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<7 »*•  = -\JW2gg13  = V4255-896  = 65 . 237 

ox  — 118.995  (from  p.  380  supra) 

Vxy  ~ fifiT  ~ 118.995  ” *5 

It  is  evident  that  the  whole  process  of  getting  97  might  equally 
well  have  been  carried  out  on  the  skull-length  variabilities.  Would 
the  result  have  been  the  same?  There  is  no  way  to  find  out  equal 
to  trying,  which  is  done  in  Table  79. 

TABLE  79 

Alternative  Calculation  of  Correlation  Ratio  from  Data  of  Table  75 


Brain-weight  classes. 

Means  of  the 
y arrays. 

y 

y2 

Z X 

1000-1099 

167.5 

9.0 

81.00 

1 

81.00 

1100-1199 

* 

1200-1299 

169.6 

6.9 

47.61 

21 

999.81 

1300-1399 

173.8 

2.7 

7.29 

66 

481 . 14 

1400-1499 

175.0 

1.5 

2.25 

101 

227.25 

1500-1599 

179.7 

3.2 

10.24 

77 

788.48 

1600-1699 

182.1 

5.6 

31.36 

25 

784.00 

1700-1799 

187.5 

11.0 

121.00 

6 

726.00 

1800-1899 

197.5 

21.0 

441.00 

2 

882.00 

Totals 

299 

4969 . 68 

Gmy 

Gy 

Vyx 


= 4 


4969 . 68 


299 
7.508 

Gmy  4 . 07  7 
“TT  = 7.508 


-y/16.621  = 4.077 


= .543 


It  is  seen  that  r)yx  is  substantially  the  same  as  rjxy  and  that 
both  are  practically  the  same  as  rxy  from  the  same  data,  its  value 
being  .522  =*=  .028.  Thus  it  appears  from  analytic  as  well  as 
visual  evidence  that  the  regressions  of  Table  75  are  linear. 

Let  us  take  another  example  where  the  regression  is  more 
evidently  non-linear.  Such  a case  is  furnished  in  Table  80,  the 
data  of  which  are  taken  from  Streeter,*  using  only  embryos  below 
400  grams  in  weight. 


* Streeter,  G.  L.:  Weight,  Sitting  Height,  Head  Size,  Foot  Length,  and  Men- 
strual Age  of  the  Human  Embryo,  Carnegie  Institution  of  Washington  Publication 
No.  274,  pp.  143-170. 
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From  this  table  it  is  at  once  evident  that  sitting  height  does 
not  increase  in  a linear  manner  as  weight  increases. 

Calculated  in  the  manner  described  earlier  in  this  chapter,  the 

correlation  coefficient  is 

r = .9440  ± .0034. 


The  computation  of  the  correlation  ratio  rj  from  the  same  data 
is  given  in  Table  81. 
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TABLE  81 

Correlation  Ratio:  Weight  and  Sitting  Height  of  Embryos 


Mean  of 

ZVX 

Type  of  array  (weight). 

array  mx 
(sitting 
height). 

mx  — Mx 

(mx 

-Mxy 

y 

(mx  - Mxy 

10 

1 

4217 

-3.5034 

12 

2738 

83 

1018.73 

30 

2 

5926 

-2.3325 

5 

4406 

54 

293.79 

50 

3 

5750 

-1.3501 

1 

.8228 

40 

72.91 

70 

4 

1000 

-0.8251 

.6808 

30 

20.42 

90 

4 

7037 

-0.2214 

.0490 

27 

1.32 

110 

5 

1739 

+0.2488 
+0.7567 
+ 1.2082 
+ 1.5749 
+ 1.9840 
+2.1749 
+ 2.5749 
+ 2.7624 
+ 2.9499 
+3.1463 
+3.3249 
+3.2749 
+3.9749 
+3.8749 
+4.1463 

0619 

23 

1.42 

130 

5 

6818 

5726 

22 

12.60 

150 

6 

1333 

1 

4597 

15 

21 .90 

170 

6 

5000 

2 

4803 

22 

54 . 57 

190 

210 

230 

250 

270 

290 

310 

330 

350 

370 

390 

6.9091 

7.1000 

7 . 5000 
7.6875 
7.8750 

8.0714 
8.2500 

8 . 2000 
8.9000 

8 . 8000 

9.0714 

3 

4 
6 

7 

8 
9 

11 

10 

15 

15 

17 

9363 

7302 

6301 

6309 

7019 

8992 

0550 

7250 

7998 

0149 

1918 

11 

10 

16 

16 

16 

14 

16 

5 

10 

10 

14 

43.30 

47 . 30 
106.08 
122.09 
139.23 
138.59 
176.88 

53 . 63 
158.00 
150.15 
240 . 69 

Totals 

454 

2873 . 60 

Mean  = 
4.9251 
— Mx 

dmx  — 


V 


2873.60 


454 


■y/ 6.3295  - 2.5158 


c ~x  — 2.5661 

2.5158 
Vxy  ~ 2.5661 


.9804 


The  question  will  arise  in  the  reader’s  mind:  Is  77  significantly 
different  from  r?  To  the  eye  the  regression  is  plainly  non-linear, 
but  we  have 

v = .9804 
r — .9440 
Difference  = . 0364 


This  is  absolutely  a small  difference.  Is  it  significant  in  com- 
parison with  its  probable  error?  To  answer  this  question  resort  is 
necessary  to  the  methods  developed  by  Blakeman4  for  testing  the 
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significance  of  the  difference  between  77  and  r.  Of  the  several 
tests  proposed  by  Blakeman  we  may  take  as  the  most  useful, 
considering  ease  of  computation, 

p.  Ey  = 2 XI  Vf  Vd  - r?Y  - d - r2Y  + f, 

where 

f = rj*  - r* 

Xi  = .67449 /\/N,  and  is  given  in  Pearson’s  Tables. 


In  the  present  example  we  have : 

P.  Ey  = 2 X .03166  X .2648  V- 03  882  “ • 10892  + 1 

= .01677  V-9896  = -01677  X .9948  = .017 
f = r?  - r2  = .961  - .891  = .070  =•=  .017 

f is  4.1  times  its  probable  error,  and  therefore  to  be  regarded 
as  significant.  We  may  then  conclude  that  the  regression  of  sitting 
height  on  weight  is  non-linear. 

CORRECTION  FOR  CORRELATION  RATIO 

It  is  important  to  remember  when  using  the  correlation  ratio 
77  that,  as  shown  by  Pearson,5  in  samples  from  material  in  which  77 

is  actually  zero,  the  mean  value  of  77  from  samples  will  be  -\JK-  ~N~-, 

where  k is  the  number  of  arrays  involved  in  calculating  77  and 

N is  the  size  of  the  sample.  It  is  evident,  therefore,  that  in  any 

value  of  77  actually  obtained  from  a sample,  there  needs  to  be 

some  correction  to  allow  for  the  influence  of  number  of  arrays. 

Pearson6  has  suggested  that 

Observed  if4  — («  — ■ 3) /A 
f - (/c  - S)/N 

is  a reasonable  value  for  the  t?2  of  the  sampled  population,  provided 
N is  fairly  large. 

“Of  course  the  first  consideration  in  any  investigation  of  77s 
is  to  determine  whether  it  is  comparable  with  (k  — 1 )/N.  If  it 
be  less  than  this  value  we  cannot  assert  significant  association. 
If  it  be  greater  than  this  value  we  have  to  consider  whether  77  as 
observed  differs  considerably  from 
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W+-67449  u^ 

and  for  general  purposes  we  must  settle  whether  rj  differs  from 
V’(*  - !)W  U,  say,  1.7 /yjW” 

SUGGESTED  READING 

1.  Darbishire,  A.  D.:  Some  Tables  for  Illustrating  Statistical  Correlation,  Mem.  and 

Proc.  Manchester  Lit.  and  Phil.  Soc.,  vol.  51,  pp.  (of  reprint)  1-20,  1907. 

2.  Yule,  G.  U.:  Introduction  to  the  Theory  of  Statistics,  Chapters  IX,  X,  and  XI. 

3.  Pearson,  K.:  Mathematical  Contributions  to  the  Theory  of  Evolution.  XIV. 

On  the  General  Theory  of  Skew  Correlation  and  Non-linear  Regression,  Draper’s 
Company  Research  Mem.  Biometric  Series  II,  Cambridge  (University  Press), 
1905. 

4.  Blakeman,  J.:  On  Tests  for  Linearity  of  Regression  in  Frequency  Distribution, 

Biometrika,  vol.  4,  pp.  332-350,  1905. 

5.  Pearson,  K.:  On  a Correction  to  Be  Made  to 'the  Correlation  Ratio,  Biometrika, 

vol.  8,  pp.  254-256,  1911. 

6.  Pearson,  K.:  On  the  Correction  Necessary  for  the  Correlation  Ratio,  Ibid.,  vol. 

14,  pp.  412-417,  1923. 

7.  Pearson,  K.:  Notes  on  the  History  of  Correlation,  Ibid.,  vol.  13,  pp.  25-45,  1920 

(An  excellent  account  of  the  early  history  of  the  subject  of  correlation.) 


CHAPTER  XV 


PARTIAL  CORRELATION 

By  a simple  extension  of  the  principle  of  two-variable  correla- 
tion, described  in  the  last  chapter,  multiple  and  net  or  partial 
correlations  may  be  determined.  Multiple  correlation  is  the 
correlation  between  one  variable  and  a series  of  other  variables 
taken  together.  A net  or  partial  correlation  is  the  correlation 
between  two  variables  when  a whole  series  of  other  variables  are 
held  constant.  The  epistemologic  value  of  the  method  of  partial 
correlation  is  great.  This  is  evident  from  the  following  considera- 
tions. 

The  most  useful  general  method  of  acquiring  knowledge  of 
dynamic  phenomena  is  unquestionably  the  experimental  method. 
When  we  deal  with  phenomena  of  human  biology,  there  is  a wide 
range  of  matters  in  which  the  laboratory  experimental  method  is, 
in  the  nature  of  the  case,  ruled  out.  Unfortunately,  one  cannot 
breed  homozygous  strains  of  men  at  will  for  experimental  pur- 
poses, nor  subject  them  methodically  to  desired  environmental 
conditions.  In  studying  most  problems  of  human  biology,  resort 
must  be  had  to  some  form  of  the  statistical  method.  This  is 
fundamentally  a descriptive  method,  and  hence,  in  many  of  its 
phases,  ill-adapted  to  the  analysis  of  dynamically  active  events. 

The  essence  of  the  experimental  method,  as  practised  in  the 
laboratory,  and  in  theory,  is  that,  of  the  multitude  of  variables 
conditioning  a phenomenon,  as  many  as  possible  are,  by  appropriate 
methods,  held  constant  while  one  or  at  most  a very  few  selected 
variables  are  allowed  to  vary  and  the  results  noted.  One  may 
then  deduce  the  relative  significance  of  the  selected  variable  in 
determining  the  phenomenon  under  observation.  Now  we  fre- 
quently hear  in  scientific  discussions  about  the  experiments  that 
nature  makes.  Actually  the  true  conditions  of  an  experiment 
are  rarely  if  ever  realized  in  the  course  of  natural  events.  It  is 
just  because  nature  permits  manifold  and  haphazard  changes  in 
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all  variables  at  the  same  time  that  recourse  must  be  had  to  the 
method  of  experimental  control  in  the  laboratory.  What  is  needed 
in  order  to  interpret  the  results,  in  the  experimental  sense,  and 
determine  the  meaning  of  the  manifold  and  ceaseless  changes  and 
variations  in  the  flow  of  naturally  determined  events,  is  some 
method  of  picking  out  of  the  manifold  some  selected  constant 
conditions  of  a series  of  variables,  and  then  measuring  the  extent 
and  character  of  the  variations  in  a single  selected  variable,  whose 
true  relative  influence  upon  the  phenomenon  it  is  desired  to  know, 
while  all  these  other  variables  are  held  constant.  If  this  can  be 
done  we  shall  have  realized  some  of  the  epistemologic  advantages  of 
the  experimental  method  as  practised  in  the  laboratory,  and  have 
freed  ourselves  at  the  same  time  of  the  limitations  which  in  so 
many  cases  inhere  in  the  material  itself,  and  make  the  laboratory 
type  of  experimental  inquiry  impossible.  In  other  words,  we  shall 
have  let  nature  perform  the  experiment,  in  the  sense  of  deter- 
mining the  phenomena,  in  her  own  way,  while  we  evaluate  the 
results  in  critically  analytic  terms  of  similar  sort  and  meaning  to 
those  in  which  we  evaluate  the  results  of  a laboratory  experiment. 

Now  exactly  this  epistemologic  boon  is  actually  afforded  in 
the  method  of  partial  or  net  correlation,  if  properly  handled.  This 
calculus  enables  one,  out  of  a manifold  complex  of  variables  operat- 
ing in  an  entirely  uncontrolled  and  natural  manner,  to  determine 
the  variation  of  any  selected  single  variable,  or  the  correlation  of 
any  selected  pair  of  variables  for  constant  conditions  or  values  of 
the  other  variables  in  the  complex. 

The  fundamental  theorems  in  partial  correlation  were  developed 
in  Pearson’s  biometric  laboratory  (cf.  Pearson1).  The  notation 
now  almost  universally  used  in  this  field  is  due  to  Yule,2  whose 
paper  should  be  carefully  studied  for  the  full  mathematical  devel- 
opment of  the  subject,  which  cannot  be  gone  into  here.  It  is  as 
follows  (Yule,  loc.  cit .,  p.  182): 

“Let  x-i  ...  xn  denote  deviations  in  the  values  of  the  n 
variables  from  their  respective  arithmetic  means.  Then  the  re- 
gression equation  may  be  written: 


XI 


= b 


12-34. 


■ nx 2 T ^13-24. 


■ nXs  + 


+ b 


l n -23. 


• n —l  Xn 


(1) 


In  this  notation  the  suffix  of  each  regression  coefficient  completely 
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defines  it.  The  first  subscript  gives  the  dependent  variable,  the 
second  the  variable  of  which  the  given  regression  is  the  coefficient, 
and  the  subscripts  after  the  period  show  the  remaining  independent 
variables  which  enter  into  the  equation.  It  is  convenient  to 
distinguish  the  subscripts  before  and  after  the  period  as  ‘primary’ 
and  ‘secondary’  subscripts  respectively.  The  order  in  which  the 
secondary  subscripts  are  arranged  is  indifferent,  but  the  order 
of  the  two  primary  subscripts  is  material;  e.  g.,  bU3  n and  b2l3  n 
denote  two  quite  distinct  coefficients.  A coefficient  with  p 
secondary  subscripts  may  be  termed  a regression  of  the  ^>th 
order,  the  total  regression  bn,  bl3,  b23,  etc.  being  thus  regarded  as 
of  order  zero. 

“The  correlation  coefficients  may  be  distinguished  by  subscripts 
in  precisely  the  same  manner.  Thus  the  correlation  f 12.34.  n is 
defined  by  the  relation 

ri2-34.  . -n  ~ (h2-34.  . 1-34.  . . n)  2 • 

In  the  case  of  the  correlations,  the  order  of  both  primary  and 
secondary  subscripts  is  indifferent.  A correlation  with  p secondary 
subscripts  may  be  termed  a correlation  of  order  p,  the  total  cor- 
relations rn,  r13}  r2 3,  etc.,  being  regarded  as  of  order  zero.” 

Now  the  essence  of  the  partial  correlation  calculus  is  that  in 
the  expression 

r 12-34-  . - n 


the  variables  represented  by  the  secondary  subscripts  34 ....  n 
are  held  constant  (and  therefore  their  effect  upon  the  total  varia- 
tion or  correlation  in  the  original,  unrestrained  conditions  is  cor- 
rected or  allowed  for),  while  those  represented  by  the  primary  sub- 
scripts 1 and  2 are  allowed  to  vary  as  much  as  they  will  under  the 
restriction  that  all  the  others  are  constant,  and  the  correlation 
between  variables  1 and  2 under  those  circumstances  is  measured. 
What  this  means  in  terms  of  biologic  realities  is  this:  In  the 
last  chapter  it  was  seen  that  there  was  less  variation  in  brain- 
weight  among  the  persons  composing  a single  array  than  among 
all  the  persons  in  the  sample  taken  together.  But  this  is  pre- 
cisely what  would  be  expected  biologically.  For  what  is  a brain- 
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weight  array?  It  is  in  this  case  simply  a group  of  persons  so  picked 
out  as  to  be  all  alike  (within  certain  narrow  limits)  in  respect  of 
skull  length.  Naturally,  if  they  are  all  alike  in  skull  length  they 
cannot  differ  (or  vary)  very  much  among  themselves  in  respect 
of  brain-weight,  because  of  the  biologic  correlation  which  exists 
between  skull- size  and  brain- weight.  Now  consider  an  extension 
of  the  same  process.  Suppose  a group  of  persons  to  be  selected 
all  of  the  same  stature,  and  let  measurements  be  made  of  the 
skull  length  and  brain-weight  of  each.  Plainly,  a correlation 
table  can  be  set  up  between  skull  length  and  brain-weight  in  this 
group.  The  resulting  coefficient  of  correlation  will  be  of  the  sort 
r i2.3)  where  1 denotes  skull  length,  2 denotes  brain-weight,  and  3 
stature.  The  coefficient  will  measure  the  correlation  between  skull 
length  and  brain-weight  for  the  one  particular  constant  stature , to 
which  the  persons  were  selected.  So,  similarly,  there  might  be 
picked  a group  of  persons  in  which  all  were  alike  in  respect  of  both 
stature  and  body-weight,  let  us  say,  and  the  correlation  between 
skull  length  and  brain-weight  determined  for  this  group.  This 
would  lead  to  a correlation  of  the  sort  rn.u-  And  so,  theoretically, 
the  process  might  be  continued  on  to  any  number  of  characters  in 
respect  of  all  of  which  the  persons  in  the  group  were  so  selected 
as  to  be  all  just  alike. 

For  the  arithmetic  work  of  the  following  numerical  example  on 
this  point  I am  indebted  to  my  colleague,  Doctor  L.  J.  Reed. 
Some  years  ago  Pearl  and  Surface*  published  detailed  measure- 
ments of  length,  breadth,  and  weight  of  453  hens’  eggs.  Now 
from  all  these  eggs 

r i2-3  = — .8955. 

This  coefficient  measures  for  the  whole  material  the  net  correla- 
tion between  length  and  breadth  when  weight  is  held  constant  by 
the  application  of  equation  (3)  infra. 

But  now  suppose  from  the  table  of  individual  measurements 
given  as  an  appendix  to  the  paper  cited  there  are  picked  out  all 
those  eggs  that  weighed  53  to  53.9  grams,  and  a correlation 

* Pearl,  R.,  and  Surface,  F.  M.:  A Biometrical  Study  of  Egg  Production  in  the 
Domestic  Fowl.  III.  Variation  and  Correlation  in  the  Physical  Characters  of  the 
Egg,  U.  S.  Dept.  Agr.  Bur.  Anim.  Ind.,  Bulletin  110,  pp.  171-241,  1914. 
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table  then  constructed,  for  these  selected  eggs , between  length  and 
breadth.  There  were  42  such  eggs  and  the  table  is  shown  as 
Table  82. 

TABLE  82 

Correlation  Between  Egg  Length  and  Breadth,  for  Eggs  Weighing  53  to 

53.9  Grams 

Egg  breadth  (mm.) 


1 

IlO.O 

60.5 

61.0 

JUi.5 

62.0 

62.5 

63.o 

Totals 

51 

- 

- 

- 

- 

- 

1 

1 

2 

52 

- 

- 

- 

- 

1 

1 

1 

3 

53 

- 

- 

- 

- 

1 

1 

2 

56 

- 

- 

- 

6 

3 

- 

- 

9 

55 

- 

- 

2 

3 

- 

- 

- 

5 

56 

1 

1 

6 

- 

- 

- 

- 

8 

57 

2 

3 

2 

1 

- 

- 

- 

8 

58 

- 

1 

1 

- 

- 

- 

- 

2 

59 

2 

1 

- 

- 

- 

- 

- 

3 

Totals 

5 

6 

11 

10 

5 

3 

2 

62 

From  this  table  the  coefficient  of  correlation  calculated  in  the 
usual  manner  described  in  the  preceding  chapter  is 

m = - .9117. 

It  will  be  noted  that  this  is  very  close  indeed  to  the  value  of 
r 12.3  given  above.  But  let  us  take  another  array  and  see  what  the 
result  is.  Table  83  gives  the  correlation  between  length  and 
breadth  of  46  eggs  picked  out  of  the  whole  lot,  each  having  a 
weight  between  56  and  56.9  grams. 

Here  the  coefficient  worked  out  in  the  usual  way  is 

rn  = - .8911, 

a result  still  closer  to  the  rn.s  value  given  above. 

Let  us  take  one  more  example,  choosing  this  time  eggs  which 
are  near  the  extreme  of  weight,  instead  of  arrays  near  the  middle 
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TABLE  83 

Correlation  Between  Egg  Length  and  Breadth  for  Eggs  Weighing  56 

to  56.9  Grams 


Egg  breadth  Cram.) 


1*0.0 

2*0.5 

2+1.0 

2*1.5 

2*2.0 

2*2.5 

2*3.0 

2*3.5 

Total  1 

52 

- 

- 

- 

- 

- 

- 

1 

1 

53 

- 

- 

- 

- 

- 

- 

- 

1 

1 

5L 

- 

- 

- 

- 

- 

2 

2* 

1 

7 

55 

- 

- 

- 

- 

3 

2 

2 

- 

7 

56 

- 

- 

- 

2 

3 

2* 

- 

- 

9 

57 

- 

- 

- 

6 

8 

- 

- 

- 

li+ 

58 

- 

- 

1 

- 

■ 

- 

- 

1 

59 

- 

- 

2 

- 

1 

- 

- 

- 

3 

60 

1 

- 

- 

- 

- 

- 

- 

- 

1 

6l 

- 

2 

- 

- 

- 

- 

- 

- 

2 

Totals 

1 

2 

2 

9 

15 

8 

6 

3 

2*6 

TABLE  84 

Correlation  Between  Egg  Length  and  Breadth  for  Eggs  Weighing  62 

to  62.9  Grams 

Egg  breadth  (mm.) 


1*2.5 

1*3.0 

2*3.5 

1*2*. c 

1*2*. 5 

2*5.0 

Total* 

55 

- 

- 

1 

- 

2 

3 

56 

• 

• 

1 

- 

1 

- 

2 

57 

- 

- 

1 

2 

- 

- 

3 

58 

- 

1 

1 

- 

- 

- 

2 

59 

1 

2 

- 

- 

- 

- 

3 

Totals 

1 

3 

3 

3 

1 

2 

13 

value.  Table  84  gives  the  length-breadth  correlation  for  13  eggs  each 
having  a weight  between  62  and  62.9  grams,  that  is,  heavy  eggs. 
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Here,  with  such  a small  array,  the  length-breadth  correlation  is 

y 12  = — .8739. 


Let  us  now  take  a weighted  mean  of  these  three  length-breadth 
correlations  (r12).  We  have: 


Whence 


-.9117  X 42  = -38.2914 
-.8911  X 46  = -40.9906 
-.8739  X 13  = -11.3607 
Totals  101  -90.6427 


Mean  Yu  = — .8975 
(By  partial  correlation)  Yn.z  = — . 8955 

Difference  = .0020 


Thus  it  is  seen,  by  this  process  of  actual  trial,  that  if  we 
physically  select  individuals  so  that  they  are  all  alike  relative 
to  one  variable  (3)  and  then  directly  measure  their  correlation 
in  respect  of  two  other  variables  (1  and  2),  the  average  corre- 
lation (f12)  so  obtained  is  substantially  identical  with  the  result 
which  we  get  mathematically  when  we  calculate  the  partial  cor- 
relation r123. 

The  only  difference  between  the  perfectly  simple  biologic  pro- 
cedure, which  anyone  can  understand,  of  selecting  individuals 
alike  in  respect  of  n variables  and  then  measuring  the  correlation 
between  two  other  variables,  and  the  processes  implicit  in  the 
arithmetic  working  out  of  the  equation  for  a partial  correlation 
coefficient, 


f 12-34 n 


r 12-34 (n— l)  r\n-z\ (n— l)  .r in-34, ( n — l) 

i1  ~ f2l«.34 {n — ■l))^(1  - ^.34 («-!))"  ’ 


is  simply  that  the  mathematical  procedure  operates  upon  the 
basis  of  the  weighted  average  variability  of  all  arrays  in  the  manifold 
space  involved  by  the  variables  held  constant.  In  the  process  of 
concrete  physical  selection  of  individuals  described  above  one  set 
of  arrays  only  can  be  dealt  with  at  one  time. 

Not  only  can  the  correlation  between  two  variables  be  deter- 
mined from  equation  (3)  when  a whole  series  of  other  characters 
are  constant,  but  also  the  reduction  in  the  variability  of  any 
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character  as  1,  2,  3.  . .n  other  variables  are  held  constant  can  be 
measured.  The  expression  for  this  is 


cr2 

1-23  • • • n 


f2 

m-23  • . . n — 1 


) (4) 


The  arithmetic  of  the  whole  process  is  extremely  simple.  For 
3 variables  equation  (3)  is,  obviously, 


r = 
12-3 


12 


— r 


13 


23 


0 - r\z)*  0 - f223 )h 


(5) 


The  zero  order  correlations  rn,  r13,  and  r13  will  be  calculated  from 
the  observed  correlation  tables  like  Table  75  in  the  preceding 
chapter.  If  we  have  in  the  whole  system  under  consideration  say 
5 variables  there  will  obviously  be  29  other  possible  first  order 
coefficients  as  follows:  rUA,  r12.s,  rU2,  r13A}  r13.5,  rU2,  r14.3,  r14.5, 

ry  y y y /y  y y * y y y y y 

15.2 ) '15.3 1 r 15.4?  '23.1)  '23.4)  ' 23.5?  '24.0  '24.3>  '24.5>  '25.0  ' 25.3?  '25.4)  '34.0 

^34.2)  ^34.5)  *35.1,  ^35.2,  r3 s.4)  ^45.o  ^45.2i  ^45.3*  Each  one  of  these  can  be 

determined  from  the  zero  order  coefficient  just  as  r12.3  was  in  (5) 
above. 

For  the  second  order  coefficients  (3)  becomes,  for  example, 


r — r • r 

12*3  14*3  24-3 


12-34  (i  - 'VO*  (i  - V,)* 


But  we  may  equally  well  write 


r — r m r 
12*4  13-4  23-4 


12-34  (1  - ffia,)*  (!  - 

These  two  methods  of  calculation  should  give  the  same  result, 
and,  in  fact,  do,  thus  furnishing  in  actual  practice  a most  useful 
check  on  the  arithmetical  work. 

For  the  third  order  coefficients  (3)  takes  such  forms  as 

^12-34  ~ f15>34  ^25-34 


12"45  (1  “ ffi5.34)*  (1  " 'VJ* 

And  so  on,  indefinitely,  except  for  the  two  following  limitations : 
(a)  All  the  zero  order  correlations  must  have  linear  regressions, 
or  the  method  is  not  valid.  Therefore  before  embarking  on  an 

26 
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extensive  partial  correlation  project  we  should  always  test  the 
zero  order  correlations  for  linearity  in  the  manner  described  in  the 
preceding  chapter. 

(b)  The  number  of  observations  in  each  of  the  zero  order 
tables  must  be  fairly  large,  as  compared  with  the  number  of  vari- 
ables dealt  with,  if  the  partial  correlation  results  are  to  be  in  any 
degree  conclusive. 

It  will  be  noted  from  the  form  of  equation  (3)  that  if  one  had 
available  tables  of  \/l— rq  sufficiently  detailed  so  that  inter- 
polation would  be  unnecessary,  the  computation  of  partial  cor- 
relation coefficients  would  become  a very  simple  matter  indeed. 
Such  tables  have,  in  fact,  been  provided  by  my  colleague,  Dr.  John 
Rice  Miner,3  and  can  be  obtained  from  the  Johns  Hopkins  Press 
at  a nominal  price. 

ILLUSTRATION  OF  PARTIAL  CORRELATION 

In  order  that  the  reader  may  become  thoroughly  familiar  with 
the  operation  of  the  useful  partial  correlation  technic,  a numerical 
example  will  now  be  presented  in  detail.  The  example  is  drawn 
from  the  writer’s  (Pearl4)  studies  on  the  epidemiology  of  in- 
fluenza. 

The  problem  set  is  this:  What  is  the  net  correlation  between 
the  destructiveness  of  the  1918-1919  influenza  epidemic  in  large 
American  cities  and  the  normal  death-rate  in  the  same  cities  from 
organic  diseases  of  the  heart,  when  all  the  cities  are  made  con- 
stant in  respect  of  ( a ) the  age  constitution  of  the  population,  (h) 
the  sex  ratio  of  the  population,  and  (c)  the  density  of  population? 

The  data  are  taken  from  Pearl.4  The  subscripts  have  the 
following  significance: 

Subscript  2 denotes  the  destructiveness  of  the  epidemic,  meas- 
ured by  the  twenty-five-week  excess  mortality  rates  calculated 
and  published  by  the  Bureau  of  the  Census.  These  twenty-five- 
week  excess  rates  indicate  the  number  of  people  dying  from  all 
causes,  during  the  twenty-five  weeks  following  the  initial  out- 
break of  the  epidemic  in  this  country  in  the  autumn  of  1918,  in 
excess  of  the  number  who  probably  would  have  died  in  the  same 
period  had  no  epidemic  occurred.  The  rates  for  the  34  cities  are 
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given  in  Table  1 (p.  12)  of  Influenza  Studies  I,  and  hence  need 
not  be  reprinted  here. 

Subscript  3 denotes  the  normal  death-rate  in  each  city  from 
organic  diseases  of  the  heart,  averaged  for  the  three  years  1915, 
1916,  and  1917. 

Subscript  4 denotes  the  age  distribution  of  the  population,  as 
measured  by  an  age-constitution  index  having  the  form 

<P  = sj^-W  - Mp) 

where  A is  the  deviation  for  each  of  six  age  groups  (viz.,  0-4,  5-14, 
15-24,  25-44,  45-64,  65  and  over)  of  the  percentage  of  the  actual 
population  of  each  city  in  1910  in  each  age  group,  from  the  per- 
centage in  the  same  group  in  the  standard  population  of  Glover’s 
life  table,  denoted  in  the  formula  by  P;  S denotes  summation  of 
all  six  values;  M = mean  age  of  living  population  in  any  com- 
munity; Mp  = mean  age  of  persons  in  a stationary  population 
unaffected  by  migration  and  which,  assuming  the  mortality  rates 
of  Glover’s  life  table,  would  result  if  100,000  persons  were  born 
alive  uniformly  throughout  each  year  {Mp  calculated  from  Lx 
line  of  Glover’s  table  (p.  16)  = 33.796  years). 

Subscript  5 denotes  the  ratio  of  males  to  100  females  in  each 
of  the  cities  in  1910. 

Subscript  6 denotes  density  of  population  calculated  from  data 
furnished  in  the  Financial  Statistics  of  Cities,”  issued  annually 
by  the  Bureau  of  the  Census,  and  was  expressed  as  the  number 
of  persons  per  acre  of  land  area  within  the  legally  defined  limits  of 
the  city. 

The  values  of  the  zero-order  correlations  and  the  first  order 
coefficients  derived  from  them  are  given  in  Table  85,  which  includes 
all  the  figures  set  down  in  making  the  calculations,  the  multi- 
plications and  divisions  having  been  made  on  a calculating 
machine. 

The  computations  go  in  this  way,  taking  the  upper  block  of 
Table  85.  To  get  the  product  term  of  the  numerator  of  equation  (3) 
r24  = .0238  is  multiplied  by  rM  = .6093,  giving  the  result  .0145, 
set  down  in  the  column  headed  “Product  term  of  numerator.” 
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TABLE  85 

Partial  Correlations.  Influenza.  Zero  and  First  Order  Coefficients 


r 0 Order 

(1  — rffi. 

Product 
term  of 
numerator. 

Whole 

nu- 

merator. 

De- 

nominator. 

r First  order. 

Subscript. 

Coefficient. 

Subscript. 

Coefficient. 

23 

+ .4874 

+ .0145 

+ .4729 

7928 

23.4 

+ .5965 

24 

+ .0238 

.9997 

34 

+ .6093 

.7930 

23 

+ .4874 

+ .0050 

+ .4824 

.9853 

23.5 

+ .4896 

25 

-.0295 

.9996 

35 

-.1682 

.9857 

23 

+ .4874 

— .0177 

+ .5051 

9811 

23.6 

+ .5148 

26 

+ .1108 

.9938 

36 

-.1595 

.9872 

24.. 

+ .0238 

.9997 

+ .0035 

+ .0203 

.9926 

24.5 

+ .0205 

25. 

-.0295 

.9996 

-.0028 

-.0267 

.9927 

25.4 

-.0269 

45 

-.1184 

.9930 

24.. ...... . 

+ .0238 

.9997 

-.0259 

+ .0497 

.9663 

24.6 

+ .0514 

26 

+ .1108 

.9938 

-.0056 

+ .1164 

.9720 

26.4 

+ .1198 

46 . 

-.2338 

.9723 

25.. 

-.0295 

.9996 

+ .0017 

-.0312 

.9937 

25.6 

-.0314 

26 

+ .1108 

.9938 

-.0005 

+ .1113 

.9995 

26.5 

+ .1114 

56. ....... . 

+ .0155 

.9999 

34 

+ . 6093 

.7930 

+ .0199 

+ .5894 

.9788 

34.5 

+ .6022 

35 

-.1682 

.9857 

-.0721 

-.0961 

.7874 

35.4 

-.1220 

45 

-.1184 

.9930 

34 

+ .6093 

.7930 

+ .0373 

+ .5720 

.9598 

34.6 

+ .5960 

36 

-.1595 

.9872 

-.1425 

-.0170 

.7710 

36.4 

-.0220 

46 

-.2338 

.9723 

35.  ....... . 

-.1682 

.9857 

-.0025 

-.1657 

.9871 

35.6 

-.1679 

36 

-.1595 

.9872 

-.0026 

-.1569 

.9856 

36.5 

-.1592 

56 

+ .0155 

.9999 

45 

-.1184 

.9930 

-.0036 

-.1148 

.9722 

45.6 

-.1181 

46 

-.2338 

.9723 

-.0018 

-.2320 

.9929 

46.5 

-.2337 

56 

+ .0155 

.9999 

+ .0277 

-.0122 

.9655 

56.4 

-.0126 

The  two  elements  in  the  denominator  V(1  — -02  382),  and 
V(1  - .60  932),  are  read  off  from  Miner’s  Tables,  as  .9997 
and  .7930  respectively.  The  whole  numerator  is  .4874  — 
.0145  = .4729,  while  the  denominator  is  .9997  X .7930  = .7928. 

Finally  r23A  = = -5965.  And  so  on  for  the  other  cases. 

The  calculation  of  the  second  order  coefficients  is  given  in 
Table  86,  which  is  of  exactly  the  same  form  as  Table  85,  except  that 
each  second  order  coefficient  is  calculated  in  two  different  ways 
( i . e.,  with  two  different  sets  of  first-order  coefficients)  as  a check 
on  the  arithmetic. 

Finally,  Table  87  gives  the  third  order  coefficient  in  which  we 
are  interested,  again  calculated  in  two  ways  as  a check. 
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TABLE  86 

Partial  Correlations.  Influenza.  First  and  Second  Order  Coefficients 


r First  order. 

(1  - +D 

Product 
term  of 
numerator. 

Whole 

De- 

r Second  order. 

Subscript. 

Coefficient. 

nu- 

merator. 

nominator. 

Subscript. 

Coefficient. 

23.4 

+ .5965 
-.0269 

+ .0033 

+ .5932 

.9921 

23.45 

+ .5979 

25.4 

.9996 

35.4 

-.1220 

.9925 

23.5 

+ .4896 
+ .0205 
+ .6022 

+ .0123 

+ .4773 

.7981 

23.45 

+ .5980 

24.5 

.9998 

34.5 

.7983 

23.4. 

+ .5965 
+ .1198 
-.0220 

-.0026 

+ .5991 

.9926 

23.46 

+ .6036 

26.4 

.9928 

36.4 

.9998 

23.6 

+ .5148 
+ .0514 
+ .5960 

+ .0306 

+ .4842 

.8019 

23.46 

.+6038 

24.6 

.9986 

34.6 

.8030 

23.5 

+ .4896 
+ .1114 
-.1592 

-.0177 

+ .5073 

.9811 

23.56 

+ .5171 

26.5 

.9938 

36.5 

.9872 

23.6 

+ .5148 
-.0314 

+ .0053 

+ .5095 

.9853 

23.56 

+ .5171 

25.6 

.9995 

35.6 

-.1679 

.9858 

25.4 

-.0269 

.9996 

-.0015 

-.0254 

.9927 

25.46 

-.0256 

26.4 

+ .1198 
-.0126 

.9928 

+ .0003 

+ .1195 

.9995 

26.45 

+ .1196 

56.4 

.9999 

24.5 

+ .0205 
+ .1114 
-.233? 

.9998 

26.5 

-.0048 

+ .1162 

.9721 

26.45 

+ .1195 

46.5 

.9723 

24.6 

-.0514 

.9986 

25.6 

-.0314 

-.0061 

-.0253 

.9916 

.25.46 

-.0255 

45.6 

-.1181 

.9930 

35.4 

-.1220 

.9925 

+ .0003 
+ .0015 

-.1223 

.9997 

35.46 

— .1223 

36.4 

-.0220 

.9998 

-.0235 

.9924 

36.45 

-.0237 

56.4 

-.0126 

.9999 

34.5 

+ .6022 
- .1592 

.7983 

36.5 

-.1407 

-.0185 

.7762 

36.45 

-.0238 

46.5 

-.2337 

.9723 

34.6 

+ .5960 
-.1679 

.8030 

35.6 

-.0704 

-.0975 

.7974 

35.46 

-.1223 

45.6 

-.1181 

.9930 

TABLE  87 

Partial  Correlations.  Influenza.  Second  and  Third  Order  Coefficients 


r Second  order. 

(1  - r2)i 

Product 
term  of 
numerator. 

Whole 

De- 

r Third  order. 

Subscript. 

Coefficient. 

nu- 

merator. 

nominator. 

Subscript. 

Coefficient. 

23.45  

26.45  

36.45  

+ .5979 
+ .1195 
-.0237 

.9928 

.9997 

-.0028 

+ .6007 

.9925 

23.456 

+ .6052 

23.46  

25.46  

35.46  

+ .6037 
-.0255 
-.1223 

.9997 

.9925 

+ .0031 

+ .6006 

.9922 

23.456 

+ .6053 
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From  this  we  see  that  there  was  a relatively  high  net  or  partial 
correlation  between  destructiveness  of  the  epidemic  outbreak  and 
normal  cardiac  death-rate,  the  coefficient  being 

7*23-456  — +.605  =*=  .073, 

when  the  demographic  variables  of  age,  sex,  and  density  are  held 
constant. 

It  should  be  noted  that  the  probable  error  of  a partial  correla- 
tion of  higher  order  is  of  the  same  form  as  that  of  a zero  order  co- 
efficient (see  Chapter  XIV). 

The  student  should  read  some  of  the  extended  investigations 
which  have  been  made  by  the  partial  correlation  method,  par- 
ticularly that  of  Miner.5 
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CHAPTER  XVI 


SIMPLE  CURVE  FITTING 

The  worker  in  practically  any  branch  of  science  is  more  or 
less  frequently  confronted  with  this  sort  of  problem:  he  has  a 
series  of  observations  in  which  there  is  clear  evidence  of  a certain 
orderliness,  on  the  one  hand,  and  evident  fluctuations  from 
this  order,  on  the  other  hand.  What  he  obviously  wishes  to  do, 
on  the  basis  of  a quite  sound  instinct,  is  to  emphasize  the  orderli- 
ness and  minimize  the  fluctuations  about  it.  His  reasoning, 
deeply  rooted  in  racial  experience  of  more  or  less  scientific  matters, 
is  that  the  orderliness  of  which  he  sees  traces,  if  really  there,  de- 
pends upon  a true  lawful  relation  between  the  variables  he  is 
studying,  and  that  the  fluctuations  are  in  general  merely  accidents 
of  random  sampling.  He  would  like  an  expression,  exact  if  pos- 
sible, or,  failing  that,  approximate,  of  the  law  if  there  be  one.  This 
means  a mathematical  expression  of  the  functional  relation  between 
the  variables. 

It  seems  desirable  to  give  the  medical  man  some  little  introduc- 
tion to  the  methods  which  the  followers  of  the  sciences  at  the  moment 
more  exact  than  medicine,  use  in  fitting  together  mathematical  ex- 
pressions and  observational  data.  It  should  be  made  clear  at  the 
start  that  there  is,  unfortunately,  no  method  known  to  mathematics 
which  will  tell  anyone  in  advance  of  the  trial  what  is  either  the  cor- 
rect or  even  the  best  mathematical  function  with  which  to  graduate  a 
particular  set  of  data.  The  choice  of  the  proper  mathematical  func- 
tion is  essentially,  at  its  very  best,  only  a combination  of  good  judg- 
ment and  good  luck.  In  this  realm,  as  in  every  other,  good  judgment 
depends  in  the  main  only  upon  extensive  experience.  What  we  call 
good  luck  in  this  sort  of  connection  has  also  about  the  same  basis. 
The  experienced  person  in  this  branch  of  applied  mathematics  knows 
at  a glance  what  general  class  of  mathematical  expression  will 
take  a course,  when  plotted,  on  the  whole  like  that  followed  by 
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the  observations.  He  furthermore  knows  that  by  putting  as 
many  constants  into  his  equation  as  there  are  observations  in  the 
data  he  can  make  his  curve  hit  all  the  observed  points  exactly, 
but  in  so  doing  will  have  defeated  the  very  purpose  with  which 
he  started,  which  was  to  emphasize  the  law  (if  any)  and  minimize 
the  fluctuations,  because  actually  if  he  does  what  has  been  de- 
scribed he  emphasizes  the  fluctuations  and  probably  loses  completely 
any  chance  of  discovering  a law. 

Of  mathematical  functions  involving  a small  number  of  con- 
stants there  are  but  relatively  few.  If  one  takes  account  of  that 
group  of  curves  which  in  his  youth  he  studied  under  the  name  of 
“conic  sections/’  adds  to  it  the  curves  which  derive  from  the  trigo- 
nometrical functions,  and  fills  out  the  equipment  with  the  loga- 
rithmic-exponential family,  he  will  not  have  exhausted  the  pos- 
sibilities of  curves  with  few  constants,  but  he  will  have  included 
the  great  bulk  of  the  mathematical  functions  which  have  so  far 
been  found  to  be  of  wide  utility  in  expressing  the  laws  of  nature. 
In  short,  we  live  in  a world  which  appears  to  be  organized  in  ac- 
cordance with  relatively  few  and  relatively  simple  mathematical 
functions.  Which  of  these  one  will  choose  in  starting  off  to  fit 
empirically  a group  of  observations  depends  fundamentally,  as  has 
been  said,  only  on  good  judgment  and  experience.  There  is  no  higher 
guide. 

Of  the  observational  data  which  the  medical  man  has  occasion 
or  desire  to  graduate  (which  means  fit  a curve  to)  perhaps  the 
most  frequent  will  be  those  in  which  there  is  a definite  trend  up 
or  down,  or  first  in  one  direction  and  then  in  the  other.  It  is  pro- 
posed now  to  show  briefly  how  to  fit  three  simple  functions,  namely, 
a straight  line,  a second-order  parabola,  and  a logarithmic  curve,  to 
such  data.  The  method  which  will  be  used  is  that  known  in 
mathematics  as  the  “method  of  least  squares,”  but  the  reader 
should  not  let  this  discourage  him.  It  is  really  very  simple.  If 
he  wants  to  know  about  its  foundation  perhaps  the  best  thing  to 
read  is  a short  paper  by  Ellis.1  If  he  prefers  a more  detailed 
mathematical  approach  than  the  present  one,  both  specifically  and 
in  general,  to  curve  fitting  problems,  Running’s2  book,  or  the 
excellent  text  on  least  squares  of  Brunt3  can  be  recommended,  or, 
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perhaps  most  useful  of  all,  Whittaker  and  Robinson’s4  comprehen- 
sive treatise. 

After  one  has,  on  the  basis  of  his  general  judgment  of  the 
whole  situation,  chosen  a particular  function  with  which  to  graduate 
a set  of  data,  the  theory  of  least  squares  says  that  “the  best  fitting” 
curve  is  that  particular  one,  out  of  the  whole  range  given  by  the 
chosen  function,  which  makes  the  sum  of  the  squares  of  the  dif- 
ferences between  the  observed  points  and  the  corresponding  points 
on  the  fitted  curve  a minimum.  This,  it  should  clearly  be  under- 
stood, is  simply  a convention.  Other  conventions  quite  as  sound 
and  well  justified  could  be,  and  have  been,  used.  For  example,  it 
may  be  said  that,  under  the  same  initial  premise  as  before,  the 
“best  fitting  curve”  shall  be  that  one  having  its  area  and  moments 
equal  to  the  area  and  moments  of  the  observations.  If  one  fol- 
lows this  definition  he  fits  by  the  method  of  moments;  if  he  follows 
the  first  definition  he  fits  by  the  method  of  least  squares.  We 
have  chosen  for  discussion  here  the  least  square  definition. 

Take  as  the  equation  to  a straight  line 

y = a + bx. 

Now,  plainly,  the  difference  between  any  observation  and  this 
curve  (for  a straight  line  is  a curve  of  zero  curvature)  will  be 

{y  — a — bx). 

There  will  be  as  many  of  such  differences  as  there  are  observations. 
The  theory  of  least  squares  insists  that  values  for  the  constants 
a and  b be  so  chosen  that 


S (y  — a — bx)2, 

where  S denotes  summation,  shall  be  a minimum.  How  shall  we 
determine  from  the  observations  the  values  of  a and  b which  will 
fulfil  this  requirement? 

This  is  done  by  solving  two  equations  (since  there  are  two  con- 
stants to  be  determined)  which  are  known  technically  as  the 
normal  equations.  How  it  is  known  that  they  are  the  right  equa- 
tions, in  respect  of  their  form,  comes  about  from  an  application 
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of  certain  principles  of  the  differential  calculus,  which  need  not  be 
gone  into  here.  The  normal  equations  for  fitting  a straight  line  are 

S (y)  — n a — b S (x)  = 0 
S (xy)  — a S (x)  — b S ( x 2)  = 0 

Transposing  terms  in  form  for  computation  these  become 

n a + b S (x)  = S (y) 
a S (x)  + b S ( x 2)  = S (xy), 

where  n is  the  number  of  observed  points. 

The  location  of  the  points  on  the  abscissal  scale  can,  of  course, 
take  origin  from  any  place  one  pleases.  It  is  convenient,  since 
usually  the  observations  are  equally  spaced  on  the  x axis,  to  take 
origin  of  x at  one  abscissal  unit  below  the  first  observation.  Then 
the  x of  the  first  observation  is  1,  that  of  the  second  2,  and  so  on; 
and  the  sum  of  the  x’s  ( S (x))  and  5 (x2)  can  be  read  directly 
from  tables  of  the  sums  of  the  powers  of  the  natural  numbers  (as 
in  Pearson’s  Tables).  All  of  this  is  merely  another  way  of  saying 
that  in  curve  fitting  just  as  in  the  calculation  of  frequency  con- 
stants (cf.  earlier  chapters)  it  is  convenient  to  work  in  abscissal 


TABLE  88 

Mean  Sitting  Heights  of  Embryo.  Curve  Fitting 


Weight  of 
embryo  in 
grams. 

Mean 
sitting 
height  in 
mm. 

y 

X 

xy 

x2y 

y log  x. 

Calculated 
y from 
parabola. 

Calculated 
y from 
log  curve. 

(>-  19 

58.8 

t 

58.8 

58.8 

0 

66.9 

55.9 

20-  39 

76.4 

2 

152.8 

305.6 

22.9987 

77.3 

78.1 

40-  59 

91 . 1 

3 

273.3 

' 819.9 

43.4658 

87.1 

91.7 

60-  79 

99.0 

4 

396.0 

1,584.0 

59.6039 

96.3 

101.8 

80-  99 

108.1 

5 

540.5 

2,702.5 

75  5587 

105.0 

110.0 

100-119 

115.1 

6 

690.6 

4,143.6 

89.5652 

113.2 

117.0 

120-139 

122.7 

7 

858.9 

6,012.3 

103  6935 

120.7 

123.2 

140-159 

129.5 

8 

1,036.0 

8,288.0 

116.9502 

127.8 

128.7 

160-179 

135.0 

9 

1,215.0 

10,935.0 

128.8227 

134.3 

133.7 

180-199 

141.1 

10 

1,411 .0 

14,110.0 

141  1000 

140.2 

138.4 

200-219 

144.0 

11 

1,584.0 

17,424.0 

149.9605 

145.5 

142.8 

220-239 

150.0 

12 

1,800.0 

21,600.0 

161.8772 

150.3 

147.0 

240-259 

152.8 

13 

1,986.4 

25,823.2 

170.2106 

154.6 

150.9 

260-279 

155.6 

14 

2,178.4 

30,497.6 

178.3375 

158.3 

154.7 

280-299 

158.6 

15 

2,379.0 

35,685.0 

186.5281 

161.4 

158.3 

300-319 

161.3 

16 

2,580.8 

41,242.8 

194.2246 

164.0 

161.8 

320-339 

160.5 

17 

2,728.5 

46,384.5 

197.4870 

166.0 

165 . 1 

340-359 

171.0 

18 

3,078.0 

55,404.0 

214.6516 

167.5 

168.4 

360-379 

169.5 

19 

3,220.5 

61,189.5 

216.7487 

168.4 

171.5 

380-399 

173.6 

20 

3,472.0 

69,440.0 

225.8588 

168.8 

174.6 

Totals. 

. .2673.7 

— 

31,640.5 

453,700.3 

2677.6433 
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units  of  grouping  rather  than  in  concrete  units  such  as  pounds, 
feet,  etc.  5 (y)  will  be  readily  got  simply  by  summing  the  ob- 
served points  (the  numerical  values  of  the  ordinates).  5 (xy) 
involves  multiplying  each  x by  its  y and  summing. 

The  best  way  to  show  how  simple  this  all  is  will  be  to 
work  out  an  example.  This  is  done  in  Table  88.  The  data 
are  drawn  from  Table  80  in  Chapter  XIV,  and  consist  of  the 
mean  sitting  heights  of  human  embryos.  The  figures  constitute 
the  observed  regression  line  of  sitting  height  on  weight. 

From  Table  88,  and  a table  of  the  sums  of  the  powers  of  the 
natural  numbers,  we  have, 

n = 20 
S (x)  = 210 
5 (x2)  - 2870 
S (y)  =2673.7 
5 (xy)  = 31640.5 


Whence  the  equations  are 

20  a + 210  b = 2673.7 
210  a + 2870  b = 31640.5 

Solving,  we  get 

a = 77.37 
b = 5.36 

y = 77.37  + 5.36  x. 

We  next  proceed  to  calculate  the  value  of  y (sitting  height) 
for  two  values  of  x as  follows: 

When 

x = 1,  y = 82.7 3 

x = 20,  y — 184.64 

The  line  can  then  be  drawn.  The  result  is  shown  graphically 
in  Fig.  87. 

It  is  apparent  that  a straight  line  is  not  the  mathematical 
function  best  adapted  to  fit  these  observations.  This  was  already 
known  from  the  value  of  rf  — r2  in  this  case,  which  proved  that 
this  was  non-linear  regression  (cf.  p.  392). 

A parabola  may  be  fitted  next  to  the  data.  Its  equation  is 

3/  = a 4 i x + c x2 
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The  normal  equations  now  are  three  in  number,  since  this  is  a 

three  constant  equation,  as  follows: 

na  + bS  (x)  + c S (x2)  = 5 (y) 
a S (x)  + b S ( x 2)  + c S (x3)  = S (xy) 
a S (x2)  + b S ( a 3)  + c S (x4)  = 5 (x2y) 

Filling  in  the  values  from  Table  88  these  become 

20  a+  210  b + 2870  c = 2673.7 

210  a + 2870  b + 44,100  c = 31,640.5 

2870  a + 44,100  b + 722,666  c = 453,700.3 


O IO  30  60  70  90  HO  130 160  170  190  310  230 260  270  290 3IO  330  350370  390210 

Weight  m Grams 

Fig.  87. — Observed  mean  sitting  heights  of  embryos  (circles)  and  straight  line  fitted 

by  least  squares. 

Solving, 

y = 55.986  + 11.195  x - .278  x2 

Substituting  successive  values  of  x and  solving  for  y gives  the 
values  of  the  ordinates  of  the  curve  exhibited  in  the  last  column 
but  one  of  Table  88.  It  is  at  once  apparent  that  the  parabola 
comes  closer  to  the  observation  than  the  straight  line,  but  it  still 
is  a poor  fit. 

The  result  is  shown  graphically  in  Fig.  88. 
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Turning  to  the  logarithmic  curve  the  equation  we  shall  use  is 

y = aJrbx9rc  log  x 

It  may  be  well  at  this  point  to  say  a word  as  to  the  reasoning 
which  leads  to  the  choice  of  this  particular  form  of  a logarithmic 
curve.  If  one  had  had  no  pedagogic  purpose  in  mind,  this  is  the 
one  of  the  three  curves  which  would  have  been  chosen  in  the  first 
instance,  and  no  straight  line  or  parabola  would  have  been  fitted. 
It  is  apparent  to  anyone  of  experience  in  such  matters  that  the 


0 to  30  50  70  90  HO  130  150  170  190  310  330  250  370  390  310  330  350 370  390  4J0 

Weight  in  Grams 

Fig.  88.— Observed  mean  sitting  height  of  embryo  (circles)  and  parabola  of  the  second 

order  fitted  by  least  squares. 

first  6 or  8 observations  are  curving  too  rapidly  to  be  capable  of 
representation  by  a second  order  parabola,  if  the  same  parabola 
is  to  come  anywhere  near  the  remaining  observations.  At  the  low 
values  of  x a logarithmic  curve  is  curving  relatively  rapidly  as 
compared  with  what  it  does  at  higher  values  of  x.  But  this  is 
precisely  what  the  observations  in  this  case  actually  do.  Hence 
one  perceives  that  there  is  needed  in  the  equation  a term  in  log  x. 
But  it  is  further  seen  that  the  observations  are  more  spread  out 
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horizontally,  that  is,  the  whole  series  is  flatter,  than  could  be 
represented  by 

y = c log  x 

whatever  value  might  be  given  to  c.  So  there  is  put  in  a line 
term,  b x,  which  has  the  effect  of  stretching  the  curve  horizontally. 
Finally,  since  all  the  observations  have  fairly  considerable  values 
(starting  at  58.8)  it  will  be  desirable  to  put  in  a constant  term  a 
to  raise  the  general  level,  from  which  the  terms  in  x operate,  up  to 
a reasonable  point. 

For  the  form  of  logarithmic  curve  chosen  the  normal  equations 
are: 

n a + b S (x)  -j-  c S (log  x)  = S (y) 

a S (x)  + b S ( x 2)  + c S (x  log  x)  — S (xy) 

a S (log  x)  + b S (x  log  x)  + c S (log  x)2  = S (y  log  x) 

The  numerical  values  here  are  again  drawn  from  Table  88  and 

for  5 (log  x) , S (x  log  x)  and  S (log  x) 2 from  table  of  sums  of  loga- 
rithmic functions  given  as  Appendix  V of  this  book. 

The  final  equations  are 

20  a + 210  b + 18.3861246  c = 2673.7 

210  a + 2870  b + 230.0033043  c = 31640.5 

18.3861246  a + 230.0033043  b + 19.2694686  c = 2677.6433 

Solving,  we  have 

y - 54.347  + 1 .555  x + 68.549  log  x 

Substituting  successive  values  of  x as  before  and  solving  for  y 
gives  the  values  in  the  last  column  of  Table  88,  which  are  shown 
graphically  in  comparison  with  the  observations  in  Fig.  89. 

It  is  at  once  apparent  that  we  now  have  a much  more  satis- 
factory graduation  than  any  attained  in  the  other  trials.  We 
could  do  still  better  by  introducing  another  term  in  the  equation, 
but,  on  the  whole,  the  present  result  may  be  taken  as  reasonably 
satisfactory. 

A final  word  may  be  said  as  to  the  writing  of  normal  equations 
in  fitting  by  least  squares.  In  the  first  place  it  must  always  be 
remembered  that  the  method  cannot  be  applied  directly  in  any 
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case  where  any  one  of  the  functions  of  the  independent  variable 
involves  an  arbitrary  constant.  If,  for  example,  in  fitting  a log 
curve  we  wish  to  use  a term  in  the  equation  of  the  form  log  (a  + x), 
which  it  is  often  convenient  to  do  because  it  changes  the  origin  of 
the  log  term  without  correspondingly  changing  the  origin  of  the 
terms  in  simple  powers  of  x,  it  is  necessary  to  go  through  a round- 
about process  of  trial  and  error  to  get  a proper  value  of  a.  It 
cannot  be  determined  directly  by  the  least  square  method. 


Fig.  89. — Observed  mean  sitting  heights  of  embryo  (circles),  and  a logarithmic  curve 

fitted  by  least  squares. 

But  with  this  caution  in  mind  we  can  lay  down  a series  of  rules 
as  follows: 

1.  Write  the  equation  of  the  curve  it  is  proposed  to  fit  with  the 
summation  sign  5 before  the  variable,  in  each  term  which  contains 
a variable  (i.  e.,  x or  y)  and  write  n before  any  term  which  does  not 
contain  a variable.  Call  the  equation  (i). 

2.  Multiply  each  term  in  (i)  by  the  function  of  x ( x itself, 
x2,  x3,  log  x,  etc.)  that  has  for  its  coefficient  the  first  constant  in 
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(i),  writing  5 before  the  variable  in  each  case,  and  dropping  the  n 
which  appears  in  (i). 

3.  Multiply  each  term  in  (i)  by  the  function  of  a,  that  has  for 
its  coefficient  the  second  constant  in  (i),  writing  S before  the 
variable  in  each  case  as  before. 

4.  Continue  this  process  till  (i)  has  been  successively  mul- 
tiplied in  this  way  by  each  function  of  x which  appears  in  it.  This 
will  make  as  many  equations  (including  (i))  as  there  are  constants 
to  determine. 

5.  Perform  the  indicated  summations  and  solve  the  system  of 
simultaneous  equations  for  the  unknowns. 
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CHAPTER  XVII 


THE  LOGISTIC  CURVE 


In  1838  a Belgian  mathematician,  P.  F.  Verhulst1  published  a 
note,  later  to  be  followed  by  two  longer  memoirs,  suggesting  the 
use  of  a curve  which  he  called  the  “logistic”  to  describe  the  growth 
of  human  populations.  His  work  was  for  many  years  forgotten. 
In  1918  Du  Pasquier2  called  attention  to  Verhulst’s  work.  In 
1920  Pearl  and  Reed,3  without  knowing  of  Verhulst’s  contribution, 
independently  derived  the  logistic  curve,  as  an  empirical  curve  to 
meet  certain  postulates  for  a curve  to  describe  the  growth  of  a 
population.  It  was  held  that  a curve  to  describe  adequately  the 
growth  of  population  in  an  area  of  fixed  limits  should  fulfill  the 
following  conditions: 

1.  Asymptotic  to  a line  y — k,  when  x = + oo. 

2.  Asymptotic  to  a line  y = 0,  when  x = — oo. 

3.  A point  of  inflection  at  some  time  x = a and  y = (3. 

4.  Concave  upward  to  left  of  x = a and  concave  downward  to 

right  of  x = a. 

5.  No  horizontal  slope  except  at  x = =±=  oo. 

6.  Values  of  y varying  continuously  from  0 to  k as  x varies 
from  — Oo  to  + Oo. 

These  postulates  led  to  the  simple,  symmetrical  logistic  curve 
of  Verhulst 


K 

y ~ 1 + Cen 


(i) 


where  y denotes  population,  t denotes  time,  and  K , C,  and  r are 
constants. 

Shortly  after  the  first  paper  by  Pearl  and  Reed  a rational,  as 
distinguished  from  an  empirical,  derivation  of  the  curve  was  given 
by  Lotka'  ’ } in  his  important  book  “Elements  of  Physical  Biology.” 
Prior  to  this  work  on  population  growth  a number  of  persons,  not- 
27  417 
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ably  the  late  Dr.  T.  Brailsford  Robertson,  on  the  basis  of  an  assumed 
analogy  between  organic  growth  and  chemical  autocatalysis,  had 
used  the  same  curve  to  describe  the  growth  in  size  of  an  individual 
organism. 

In  practical  work  with  the  logistic  curve  it  soon  became  apparent 
that  Pearl  and  Reed’s  postulate  2 was  too  rigid;  that  in  fact  the 
lower  asymptote  of  the  curve  often  was  not  zero  but  was  distant 
from  zero  by  some  amount  which  may  be  called  d.  The  curve 
becomes 

y - d - nfer  (!i) 

Also  it  became  apparent  that  any  complete  theory  of  population 
growth  demanded  recognition  of  its  occasionally  cyclic  character, 
together  with  the  possibility  of  skew  as  well  as  symmetrical  growth. 
This  led  to  the  generalized  logistic4 


1 -)-  C6a^  T °2^2  T a3^3  + • • • Qntn 


The  logistic  curve  has  in  recent  years  been  extensively  discussed, 
and  its  usefulness  demonstrated  for  the  description  of  many  sorts 
of  phenomena.5  This  fact  appears  to  justify  the  inclusion  in  this 
text-book  of  a brief  description,  with  a numerical  example,  of  the 
method  of  fitting  this  curve,  a varied  and  extensive  experience 
having  shown  the  method  here  described  to  be  simple  and  accu- 
rate. 

The  equation  to  the  logistic  may  be  written  in  the  form 

y = r+v+7<  <iv) 


The  rate  of  change  of  y with  respect  to  t,  that  is,  the  increase 
in  mass  per  unit  of  time,  is  given  by  the  equation 


dy  Krea  + rt 

dx  (1  + ea  + rt)2 


(v) 


By  substitution  from  equation  (iv)  this  may  be  put  in  the  form 


, dy  _ ry{K  — y) 
y ~ dx  ~ K 


(vi) 
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In  Fig.  90  are  shown  the  graphs  of  equations  (iv)  and  (v). 
Study  of  these  graphs,  and  a little  elementary  analysis  of  equations 


Fig.  90. — Diagram  of  simple  logistic  curve. 


(iv)  and  (vi),  lead  to  the  following  statements  which  are  important 
in  the  understanding  of  the  logistic: 

(a)  The  logistic  is  asymptotic  to  a line  K units  above  the  t 
axis  and  parallel  to  it. 

(, b ) The  curve  has  a point  of  inflection  at  the  co-ordinates 
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(c)  The  time  rate  of  change  of  the  mass  y is  greatest  at  the 
point  of  inflection,  and  the  rate  at  this  point  is  given  by 


dy 

dx 


rK 

4 ' 


(d)  When  K approaches  infinity  in  equation  (vi)  we  have  the 
limiting  equation  ~ = — ry.  This  is  the  differential  equation  for 

geometric  increase,  and  in  this  equation  r indicates  the  rate  of 
compounding.  Thus  we  may  say  that  in  the  logistic  curve  the 
constant  r is  the  inherent  rate  of  growth  of  the  population,  and 
that  this  rate  diminishes  with  time.  That  the  rate  does  not  hold 
to  the  inherent  value  r is  a result  of  the  damping  effect  of  the 
factor  (K  — y),  which  measures  the  aggregate  of  forces  that  slow 
down  and  Anally  stop  the  growth. 

(e)  The  constant  a (or  C in  equation  (1))  is  obviously  the  con- 
stant of  integration  and  therefore  deflnes  the  relative  positions  of 
the  origin  and  the  curve.  If  the  origin  on  the  time  axis  is  trans- 
ferred to  the  position  of  the  point  of  inflection,  a becomes  zero  and 
the  curve  takes  the  form 

K 


y = 


1 + ert 


(vii) 


FITTING  THE  LOGISTIC 


The  simplest  method  of  fitting  the  symmetrical  logistic  curve 
depends  on  the  fact  that  equation  (ii)  may  be  changed  into  the  form 


loge 


K - (y  - d) 

y — d 


= loge  C + rt 


(viii) 


In  other  words  loge  [K  — (y  — d)\/{y  — d)  is  a straight  line  func- 
tion of  time.  In  fitting  a set  of  observations,  therefore,  we  begin 
by  making  as  good  a guess  as  we  can  as  to  the  values  of  the  upper 
and  lower  asymptotes.  The  lower  asymptote  gives  the  value  of  d, 
while  the  value  of  K is  obtained  by  subtracting  d from  the  upper 
asymptote.  We  then  calculate  Z = [K  — (y  — d)\/(y  — d ) for 
each  observation  and  plot  each  value  as  an  ordinate  on  arithlog 
paper  against  the  corresponding  time  as  an  abscissa.  If  we  have 
made  good  guesses  as  to  the  values  of  the  upper  and  lower  asymp- 
totes and  if  the  observations  are  approximately  symmetrical,  the 
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plotted  points  should  fall  nearly  on  a straight  line.  If  they  do  not, 
we  try  new  values  of  d or  K or  both  until  the  resulting  values  of  Z 
plotted  on  arithlog  paper  are  approximately  fitted  by  a straight 
line,  which  we  determine  by  eye.*  From  any  two  convenient 
points  on  this  line  we  determine  the  slope  m and  the  value  of  Z 
when  t = 0.  This  value,  Z0,  = C while  r = 2.30259 m. 

As  an  example  we  shall  fit  the  growth  of  the  population  of  Sweden 
from  1750  to  1920,  as  shown  in  Table  89.  As  a first  assumption  we 
take  the  upper  asymptote  equal  to  7.66  million  population  and  the 
lower  asymptote  equal  to  1.56  million.  Therefore  d = 1.56  and 
K = 7.66  — 1.56  = 6.10.  The  calculation  of  Z is  shown  in  Table 
89.  Plotting  the  values  of  Z on  arithlog  paper  against  the  corre- 
sponding values  of  t,  as  shown  in  Fig.  91,  we  see  that  they  are  satis- 
factorily fitted  by  a straight  line.  The  unit  in  which  t is  measured 
is  one  year,  and  the  origin  the  year  1800.  To  find  the  values  of  r 


TABLE  89 

Fitting  of  Logistic  Curve  to  Population  of  Sweden:  First  Approximation 

by  Graphic  Method 


d = 1.56;  K = 6.10 


Year. 

Population 
in  millions 
(observed). 

y 

I 

II 

v> 

Vi 

1 

N 

>> 

0 

t. 

eTt ■ 

1 + Ce^ 

y 

calcu- 

lated. 

Difference 

calculated 

—observed. 

1750 

1.763 

.203 

29. 

0 

-50 

3.245749 

23.999994 

1.8142 

.0512 

60 

1.893 

.333 

17. 

3 

-40 

2.564791 

19.174596 

1.8781 

-.0149 

70 

2.030 

.470 

12. 

0 

-30 

2.026698 

15.361567 

1.9571 

-.0729 

80 

2.118 

.558 

9. 

93 

-20 

1.601497 

12.348512 

2.0540 

- . 0640 

90 

2.158 

.598 

9. 

20 

-10 

1.265503 

9.967595 

2.1720 

.0140 

1800 

2.347 

.787 

6. 

75 

0 

1 . 000000 

8.086190 

2.3144 

-.0326 

10 

2.378 

.818 

6. 

46 

10 

.7902000 

6.5995073 

2 . 4843 

.1063 

20 

2.585 

1.025 

4. 

95 

20 

.6244160 

5.4247304 

2 . 6845 

.0995 

30 

2.888 

1.328 

3. 

59 

30 

.4934136 

4.4964225 

2.9166 

.0286 

40 

3.139 

1.579 

2. 

86 

40 

.3898954 

3.7628729 

3.1811 

.0421 

50 

3.483 

1.923 

2. 

17 

50 

.3080952 

3.1832211 

3.4763 

- . 0067 

60 

3.860 

2.300 

1. 

65 

60 

.2434568 

2.7251811 

3.7984 

-.0616 

70 

4.168 

2.608 

1. 

34 

70 

.1923796 

2.3632384 

4.1412 

-.0268 

80 

4.566 

3.006 

1. 

03 

80 

.1520184 

2.0772313 

4.4966 

-.0694 

90 

4.785 

3.225 

.891 

90 

.1201249 

1.8512279 

4.8551 

.0701 

1900 

5.136 

3.576 

.706 

100 

.09492267 

1.6726401 

5.2069 

.0709 

10 

5.522 

3.962 

.540 

110 

.07500790 

1.5315202 

5.5430 

.0210 

20 

5.904 

4.344 

.404 

120 

.05927123 

1.4200072 

5.8558 

-.0482 

* Of  course  a straight  line  can  be  fitted  exactly  by  least  squares  according  to  the 
method  given  in  Chapter  XVI  if  this  refinement  is  thought  desirable.  It  should  be 
noted,  however,  that  in  fitting  this  line  by  least  squares,  points  near  either  asymptote 
will  have  undue  influence  in  determining  the  slope. 
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and  C,  we  take  the  value  of  Z at  t = —50,  Z_50  = 23,  and  the  value 
of  Z at  t = +120,  Z120  = 0.42;  log  Z_50  = 1.3617278,  log  Z120  = 
9.6232493  - 10. 


/750  60  70  80  90  / 800  /O  20  30  40  50  60  70  80  90  /900  /O  20 

JK.  9 ( ’^d') 

Fig.  91. — Plot  of  Z = — - — ■ ■ ■ on  arithlog  paper  in  fitting  logistic  curve  to  the 

(y 

population  growth  of  Sweden. 


log  Z 120  — log  Z—  50  _ — 1.738  4785 
120  - (-  50)  ' 170“ 

r = 2.302  59 m = - 0.023  5470 
log  Zo  = log  Z—  50  + 50m  = 0.850  4128 
C = Zo  = 7.086  190 


- 0.010  2263 


In  other  words,  the  symmetrical  logistic  curve  which  we  have  fitted 
to  the  population  growth  of  Sweden  is 


y — 1.56  = 


6.10 

1 + 7.086c-  0.0235* 


(ix) 


where  y is  population  in  millions  and  t is  time  in  years  from  the 
year  1800. 
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Having  got  the  equation  of  our  curve,  we  next  wish  to  calculate 
from  it  the  populations  at  the  different  census  years  to  compare 
with  the  observed  populations.  These  calculations  are  shown  in 
Table  89.  Since  ert  = 10w/,  we  multiply  the  successive  values  of 
t by  m,  and  look  up  the  antilogarithms  in  a logarithm  table.  These 
antilogarithms  are  the  successive  values  of  ert.  K is  divided  by 
the  successive  values  of  1 + Cert ; the  quotient,  plus  <7,  is  the  popula- 
tion value  from  the  curve  for  a given  census  year.  The  last  column 
of  Table  89  shows  the  differences  between  the  calculated  and 
observed  values.  The  largest  of  these  is  0.106  million  or  about  4 
per  cent,  of  the  population  of  that  year.  We  square  each  difference, 
add,  divide  by  the  number  of  differences,  and  get  the  square  root 
of  the  quotient.  The  result,  which  is  called  the  root-mean-square 
deviation  (and  which,  be  it  noted,  is  closely  related  to  the  standard 
deviation)  turns  out  to  be  0.0574  millions. 

This  is  a good  fit  of  the  curve  to  the  observations.  It  is  desir- 
able, however,  to  get  the  best  fitting  curve.  Now,  as  has  already 
been  seen,  the  method  of  least  squares  cannot  be  applied  to  the 
fitting  of  a curve  in  which,  as  is  the  case  in  the  logistic  curve,  the 
constants  to  be  determined  enter  the  expression  in  other  than  a 
linear  manner.  However,  when  we  already  have  a good  fit  of  the 
curve,  this  can  be  treated  as  a first  approximation  to  the  best 
fitting  curve  sought  and  expanded  by  Taylor’s  theorem.  Neglect- 
ing terms  of  higher  order  than  the  first  we  have  an  expression  linear 
in  the  correction  terms  which  are  to  be  determined.  Normal 
equations  can  then  be  formed  in  the  manner  already  explained  in 
Chapter  XVI  and  solved  for  the  unknown  correction  terms.  If 
the  second  approximation  to  the  best  fitting  curve  thus  obtained  is 
not  close  enough  to  suit,  we  may  repeat  the  process  to  obtain  a 
third  approximation  and  so  on.  The  details  of  the  work  are  as 
follows : 

Let  p be  an  approximate  value  of  r,  and  h be  the  correction  to  p (r  = p + h) 

C'  be  an  approximate  value  of  C,  and  i be  the  correction  to  C'  (C  — C'  - f-  i ) 
(t  be  an  approximate  value  of  d,  andj  be  the  correction  to  a (d  — <r  + j) 

K'  be  an  approximate  value  of  K , and  k be  the  correction  to  K ' (K  = K'  4-  k) 


Then 


K'  + k 


1 + (C7  + i)e(P  + hd 
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or,  approximately,  expanding  by  Taylor’s  theorem  and  neglecting 
terms  of  higher  order  than  the  first 


3'  = 


1 + C'epiJ 


+ h 


/-  K'ePtC't\  . 
\(1  +C'ePt)2)  + t 


((1  + c£<)>)  +j  + k (l  + C'eP 


(xi) 


Let 

- K'ePtC't  - K'ePt  . 1 , K'  7 , ... 

(1  + C'epty  a;  (1  + C'epty  b;  1 + Cept  ~ c;<T+  1 + Cept  y ~ (xil) 

(In  calculating  these  quantities  we  may  note  that  a — C'tb;  b = — K'epk 2; 
/ = cr  -f-  K'c  — y.) 


Then 


ah  4"  bi  4"  j 4"  ck  4~  l — 0 


From  this  are  derived  the  following  normal  equations: 


//S(u2)  + iX(ab)  +iS(u)  + ^S(uc)  4-  2(aJ)  = 0 
hX(ab)  4-  iX(b2)  + j2(b)  + kx(bc)  4-  S (bl)  =0  .... 

hi:  (a)  + iS(&)  + iA  + 62(c)  + 2(0  = 0 (Xlll) 

hX(ac)  4-  i2(bc)  + j2(c)  4-  62(c2)  + 2(c0  = 0 

TABLE  90 

Fitting  of  Logistic  Curve  to  Population  of  Sweden:  Second  Approximation 
by  Least  Squares — Calculation  of  Product-sums 

p = - 0.0235470;  C = 7.086190;  <r  = 1.56;  K'  = 6.1 


Year. 

y- 

t. 

a. 

b. 

c. 

l. 

s. 

1750 

1.763 

-50 

12.178822 

- .03437340 

.04166668 

.05116675 

12.237283 

60 

1.893 

-40 

12.061526 

-.04255293 

.05215234 

- .01487073 

12.056254 

70 

2.030 

-30 

11.137352 

-.052  8994 

.06509753 

- .07290507 

11.077155 

80 

2.118 

-20 

9.079659 

-.06406587 

.08098142 

- .06401334 

9.032561 

90 

2.158 

-10 

5.505860 

- .07769845 

.10032510 

.01398311 

5.542470 

1800 

2.347 

0 

0 

-.09329148 

.12366764 

- .03262740 

-.002250 

10 

2.378 

10 

-7.842537 

-.11067354 

.15152646 

. 10631141 

-7.695374 

20 

2.585 

20 

-18.343844 

- . 12943376 

. 18434096 

.09947986 

— 18 . 189457 

30 

2.888 

30 

-31.647599 

- . 14886984 

.22239903 

.02863408 

-31.545436 

40 

3.139 

40 

-47.611473 

-.16797275 

.26575439 

.04210178 

-47.471590 

50 

3.483 

50 

-65.71492 

-.1854732 

.31414720 

-.00670208 

-65.59294 

60 

3.860 

60 

-85.02076 

-.1999682 

.36694809 

-.06161665 

-84.91540 

70 

4.168 

70 

-104.22805 

- .2101231 

.42314817 

- .02679616 

-104.04182 

80 

4.566 

80 

-121.83133 

- .2149098 

.48141004 

-.06939876 

-121.63423 

90 

4.785 

90 

-136.36356 

-.2138174 

.54018201 

.07011026 

-135.96709 

1900 

5.136 

100 

-146.65862 

- .2069640 

.59785724 

.07092916 

-146.19679 

10 

5.522 

110 

-152.05365 

-.1950704 

.65294601 

.02097066 

-151.57480 

20 

5.904 

120 

-152.47080 

-.1793051 

.70422178 

-.04824714 

-151.99414 

-1019.82392 

-2.5269531 

5.36877209 

.10650974 

-1016.87561 

. 

2 (a2) 

127,921.79 

2(o*) 

207.20940 

y.(ac) 

-543.70734 

2(o6) 

207.20940 

2(62) 

.42744085 

2 (6c) 

-.96685 

2(<*c) 

-543.70734 

2(M 

- .96685321 

2(c2) 

2.4332851 

2(o/) 

-5.310004 

2(6/) 

-.018225622 

2(c/) 

.031328884 

127,579.98 

206.65176 

-542.20957 

i:(as) 

127,579.99 

2(6$) 

206.65174 

2 (cs) 

-542.20955 
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The  calculation  of  the  various  sums  is  shown  in  Table  90.  Each 
item  in  column  5 is  the  sum  of  the  corresponding  items  in  columns 
a,  b,  c,  and  l.  In  this  way  we  have  a check  on  the  calculation  of  the 
various  product-sums  entering  into  the  normal  equations. 

The  values  of  the  correction  terms  may  be  found  from  the 
normal  equations  by  any  of  the  algebraic  methods  for  solving  simul- 
taneous equations.  However,  it  is  usually  most  advantageous  to 
solve  by  a method  developed  by  the  great  astronomer  and  mathe- 

TABLE  91 

Fitting  of  Logistic  Curve  to  Population  of  Sweden:  Second  Approximation 
by  Least  Squares — Solution  of  Normal  Equations  by  Doolittle  Method 


h. 


i. 


3- 


k. 


-.0000078172765 

127,921.79 
h = 

207.20940 
- .0016198132 

-1019.8239 

.0079722454 

-543.70734 

.0042503106 

-5.310004 

.00004150977 

.09180033 

-.8750289 

- .08614890 

-.009624407 

-10.893207 

i = 

9.531871 

.9384378 

. 10484066 

1.529051 

.2130436 

-.02756153 

-.6540004 

3 = 

-.1393306 

.01802525 

.0118311 

.00356796 

.42744085  -2.5269531  -.96685321  -.018225622 

-.33564052  1.6519242  .88070431  .008601215 


18.  5.3687721  .10650974 
-8.130286  -4.3345683  -.04233266 
-8.340663  -.8211602  -.09173861 


/.  = - .00140006;  r 
i =+.394162;  C 
= +.0600439;  d 
) = - .301575;  K 


P 

C 

a 

K' 


+ h 
+ t 

+ 3 
+ k 


-.02494706 

7.480352 

1.6200439 

5.798425 


2.4332851 

-2.3109251 

-.0808454 

-.0296835 


.03132888 

-.02256917 

-.00903191 

.00384016 


Upper  asymptote  = 7.418 

Check 

-543.707  h = .761222 
-.966853  i = - .381097 
5.36877  j = .322362 
2.43329  k = -.733819 
.031329 

-.000003 


matician,  Gauss,  or  by  a modification  of  this  method  devised  by 
M.  H.  Doolittle,  of  the  United  States  Coast  and  Geodetic  Survey. 
Both  of  these  methods  are  described  by  Brunt  (see  reference  3, 
Chapter  XVI).  The  solution  of  the  normal  equations  by  the  Doo- 
little method  is  shown  in  Table  91,  and  the  calculation  of  ordinates 
from  the  new  curve  in  Table  92.  This  does  not  differ  in  principle 
from  the  calculation  of  ordinates  of  the  first  approximation  curve 
already  shown  in  Table  89.  For  the  new  curve  the  root-mean- 
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TABLE  92 


Fitting  of  Logistic  Curve  to  Population  of  Sweden:  Second  Approximation 


by  Least  Squares — Calculation  of  Ordinates 


Year. 

t. 

eTt- 

1 + CerL 

y calculated. 

y observed. 

Difference. 

1750 

-50 

3.48112 

27.0400 

1.8344 

1.763 

+ .0714 

60 

-40 

2.71253 

21.2907 

1.8923 

1.893 

- . 0007 

70 

-30 

2.11364 

16.8108 

1 . 9649 

2.030 

-.0651 

80 

-20 

1 . 64698 

13.3200 

2.0553 

2.118 

-.0627 

90 

-10 

1.28335 

10.5999 

2.1670 

2.158 

+ .0090 

1800 

0 

1.00000 

8.48035 

2.3037 

2.347 

-.0433 

10 

10 

.779213 

6.82879 

2.4691 

2.378 

+ .0911 

20 

20 

.607174 

5.54188 

2 . 6663 

2.585 

+ .0813 

30 

30 

.473117 

4.53908 

2 . 8974 

2.888 

+ . 0094 

40 

40 

. 368659 

3.75770 

3.1631 

3.139 

+ .0241 

50 

50 

.287264 

3.14884 

3.4614 

3.483 

-.0216 

60 

60 

.223840 

2.67440 

3.7881 

3.860 

-.0719 

70 

70 

. 174419 

2.30472 

4.1359 

4.168 

-.0321 

80 

80 

. 135910 

2.01665 

4.4953 

4.566 

-.0707 

90 

90 

. 105903 

1.792192 

4.8554 

4.785 

+ .0704 

1900 

100 

.0825207 

1.617284 

5 . 2053 

5.136 

+ .0693 

10 

110 

.0643012 

1.480996 

5.5352 

5.522 

+ .0132 

20 

120 

. 0501044 

1 . 374799 

5.8377 

5.904 

-.0663 

Root  mean  square  deviation  = .0562 

Root  mean  square  deviation  of  first  approximation  = . 0574 


square  deviation  is  .0562,  as  compared  with  the  value  .0574  already 
found  for  the  old  curve.  In  other  words,  the  new  curve  is  a slightly 
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better  fit  to  the  observations  than  the  old  curve,  but  only  slightly. 
This  is  apparent  graphically  in  Fig.  92. 

In  some  cases  a growth  curve  is  skew,  i.  e.,  the  curvature  in  the 
early  part  of  growth  is  at  a different  rate  from  the  curvature  in  the 
latter  part  of  growth.  In  these  cases  Z plotted  on  arithlog  paper 
cannot  be  fitted  with  a straight  line,  whatever  values  of  d and  K 
are  chosen,  but  must  be  fitted  with  a parabola  of  odd  degree;  usually 
a cubic  parabola  is  sufficient.  If  it  is  not,  a fifth  order  parabola  may 
be  used.  The  remainder  of  the  work  of  fitting  a skew  logistic  fol- 
lows the  same  principles  as  have  been  set  forth  above  in  detail  for 
the  symmetrical  logistic. 
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AIDS  TO  BIOMETRIC  WORKERS 

The  following  tables  are  indispensable  to  the  biometric  worker. 

1.  Pearson,  K.  (Editor):  Tables  for  Statisticians  and  Biometricians,  Cambridge 

University  Press,  1914. 

2.  Barlow’s  Tables  of  Squares,  Cubes,  Square  Roots,  Cube  Roots,  Reciprocals, 

London  (E.  & F.  N.  Spon,  Ltd.),  1919. 

3.  Bruhns,  C.:  Neues  logarithmisch-trigonometrisches  Handbuch  auf  sieben  Deci- 

malen,  Leipzig  (Tauchnitz),  1919.  (Any  other  7-place  table  will  do,  but  Bruhns 
is  surpassed  by  none.) 

4.  Miner,  J.  R.:  Tables  of  V 1 — r2  and  1 — r 2 for  Use  in  Partial  Correlation  and  in 

Trigonometry,  Baltimore  (The  Johns  Hopkins  Press),  1922. 

In  addition  to  the  above,  the  following  will  be  found  useful: 

Glover,  J.  W.:  Tables  of  Applied  Mathematics  in  Finance,  Insurance,  Statistics, 
Ann  Arbor,  Mich.  (George  Wahr),  1923.  (This  contains  what  appears  to  be 
a photographic  reprint  of  Bruhns’  7-place  logarithms  of  numbers.) 

Carr,  G.  S.:  A Synopsis  of  Elementary  Results  in  Pure  Mathematics:  Containing 
Propositions,  Formulas,  and  Methods  of  Analysis,  with  Abridged  Demonstra- 
tions, London  (Francis  Hodgson),  1886.  (This  book  is  out  of  print  and,  there- 
fore, difficult  to  acquire,  but  to  him  who  has  it  it  is  an  invaluable  desk  com- 
panion.) 

APPENDIX  II 

MATHEMATICAL  FORMULAS  AND  CONSTANTS 


Multiplication 


(1) 

1 a 

— 

a ; 3a 

= a + a + a 

(2) 

(a 

+ 

b)  c = 

ac  T 

be 

(3) 

0 

— 

b)  c — 

ac  — 

be 

(4) 

(a 

+ 

b)  . ( c 

+ d) 

= (0 

+ b)  c 

+ ( 

a + b) 

= ac 

~t~  ad  -}“  be 

+ bd 

(5) 

(a 

— ■ 

b)  . (c 

+ d) 

= (a 

— b)  c 

+ c 

a — b) 

= ac 

+ 

© 

1 

- be 

- bd 

(6) 

(a 

T- 

b)  (c  - 

-d)  = 

: (a  T b)  c — 

■ (0 

+ b)d 

ac  — 

ad  + 

be  — 

bd 
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(7) 

(a 

- b)  (c 

-d)  = 

(a  - 

b)  c — (a  — b)  d 

= 

ac  — 

ad  — be 

+ bd 

(8) 

(ft  + 1)  b 

= ab  + b 

(9) 

(a  • 

1 )b 

= ab  — 

b 

(10) 

(ft  + b)  (c 

+ 1)  = 

ac  + be  + a 

+ b 

(11) 

(a  - 

■f  b)  (c 

- 1)  = 

ac  be  — a 

-b 

(12) 

(a 

- b ) (c 

+ 1)  = 

ac  — 

be  + a 

- b 

(13) 

(a 

- b)  ( c 

- 1)  = 

ac  — 

be  — a 

+ b 

(14) 

ab 

= ba 

(IS) 

a . 

0 = 0 

« 

(16) 

(+ 

a)  . (+  b)  or  (■ 

— a)  . 

(-  b)  ■■ 

= + ab 

(17) 

(+ 

a)  . (- 

- b)  or  (■ 

— a)  • 

(+  b)  . 

= — ab 

Division 


(1) 

(2) 

(3) 

(4) 


a 

. b or 

II 

<3  ' 

b 

b 

ab 

a 

7 b 

— 

■ • b — 

. a 

c 

c 

c 

a 

a 1 

a 

b 

. c — 

b c 

be 

a 


a • c 


a : c 


(5)-f 

<6>t 


c 

~d 

c 

~d 


b . c b : c 
ac 


(7)  — + — 

(8)  — - A 


(9)  a + 

(10)  a — 


c 

b 


bd 

ad 

be 

a + b 
c 

a — b 
c 

ac  4-  b 
c 

ac  — b 


# 
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(ID 

(12) 

(13) 

(14) 

(15) 

(16) 

(17) 

(18) 
(19) 


a , „ a + b 

b 1 = — 5 — 

b b 

a i _ a — b 

~b  Y~ 


1 + 1 


a b 

J_  _ JL_  = 

a b 

a c 

J + ~d 


a 

J 


a 


a • + b 

a + b 
2 

a + b 


c 
~d 

+ 1 - 


a -f~  b 
ab 

b — a 
ab 

ad  + be 
bd 

ad  — be 
bd~ 
2a  + b 
a b 


, a — b 

H — = CL 


2 

a — b 


a — b 
ab 


= b 


(20) 

(21) 

(22) 

(23) 

(24) 


1 


a 


1 

b 

1 

b 


o 

— = o 
a 


a 

= GO 

0 


+ CL 

+ b 


or 


4~  CL 

- b 


or 


- a 

Tb 


b -\-  a 
b — a 


Powers 

<z4  = aaaa ; a1  = a\  la  = 1 

(1)  (+  a)"  = + a" 

(2)  (—  a)n  = + an,  if  n is  an  even  number 
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(3)  (—  a)n  = — an,  if  n is  an  odd  number 

(4)  ( ab)m  = am  bm 


(5)  ( a:b)m 


am 

bm 


(6)  am  • an  = am  + n 

nm 

(7)  am  : an  — — - = a 


(8)  a"  + 1 : an  = a 


(11)  (am)n  = amn 

(12)  3a°  = 3;  (3a)°  = 1 

(13)  a-1  = — 


(15)  (a’*  + 1)2 

(16)  ax . a~y 


(17)  a x a y 


m x 

any 

a2n . a 2 

ax~y 

Q—  (x  + y) 


ax  + y 


(20)  ( a~x)y  = a~xy 

(21)  (ax)~y  = a~xy 

(22)  ( a~x ) ~y  = axy 

(23)  (a  + 5) 2 = a2  + 2ab  + b2 

(24)  (a  — b)2  = a2  — 2 ab  + b2 

(25)  a2  — b2  = (a  + b)  (a  — b) 

(26)  (<z  -f-  5 4"  c)2  = <r2  4~  2ab  4-  b2  4~  2 ac  4~  2bc  4~ 

(27)  ( a 4-  b)z  = a 3 4~  3<z26  4-  2>ab2  4-  bz 
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(28)  (a  — b)3  — a3  — 3a2b  + Sab2  — b 3 

(29)  as  + b3  = (a  + b)  ( a 2 — ab  + b2) 

(30)  a3  — b3  = (a  — b)  {a2  + ab  + b2) 

(31)  ( a + b )4  = a4  + 4 a3b  + 6a2b2  + 4 ab3  + b 4 

(32)  (a  + b)3  = ab  + 5 aAb  + 10 a3b2  + 10 a2b3  -f  5ab 4 + b5 


Roots 

n n In  \n 

(1)  V a = b;  V an  = a;  \ V a)  = a 

n 

(2)  Vo™*  = am 

n n n 

(3)  -V ab  = V a . \/b 


n n 


(8)  y/ a2  = =*=  a;  y/  (a  + b)2  = =*=  (a  + b) 

Incorrect:  Va2  + b2  = a + b;  and 

3 

y/ a3  + b3  = a + b 

(9)  (V a + y/b)  (V a — y/b)  = a — b 

(10)  (a  + y/b)  (a  — y/b)  = a2  — b 

(11)  (y/a  + b)  ( y/a  — b)  = a — b2 

(12)  Vl  + x = 1 + i x - } x2  + re*3  - m + m “ 

( 1 \ / 1 /V*  1 T /y»  1 /y*2  L /y»3  ^ /y*4  Z_  /y*5  _ 

\-*-Oy  VI  X 1 2*^  o'  16  ^ 128  ^ 256  • • • 

3 

(14)  a/  1 + X — 1 + ^ # ~ -g-  + si  ~ 24°3  + • • • 

/"  1 \ \/  1 T 1 L /y*  1_  ^ /y* 3 /y*4  A*  5 __ 

^ J.  y v x yv  — i o (j  yV  81  243  *v  729  *v  . . • 

28 
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Fractional  Powers 


(1) 

(2) 

(3) 

(4) 


m x 

(f  = Vo” 


a 


m 

n 


P 

aq 


m p 

a"  q 


m p m p 

an  : aq  = an  q 


m mm 

(a  b)n  = an  . bn 


(5) 

(6) 


Logarithms 

(1)  log0  a = 1;  log  1 = 0. 

(2)  log  MN  = log  M + log  N 

(3)  log  ~ = log  M - log  N. 

(4)  log  ( M)n  = n log  M. 

n _ ^ 

(5)  log  V m = — log  M. 

n 


4 


Proportion 

From  a : b = c : d it  follows: 

(1)  a : c = b : d 
b : a = d : c 
b : d — a : c 
c : a = d : b 
c : d = a : b 
d : b = c : a 
d : c = b : a 

(2)  ad  = be 

be  7 ad  ad 

— ; b = — ; c = 
a 


c 


b 
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(4) 

(5) 


ma  : mb  = c : d 


ma  : b 

a b 
• _____ 

n ’ n 


= me  :d,  etc. 
— c \ d 

— : d,  etc. 
n 


(6)  an  : bn  = cn  : dn 

n n n n 

(7)  Va:Vb  = Vc:Vd 


Differential  Coefficients  of  Simple  Functions 
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(8)  y 

y 

(9)  y 

(10)  y 

(11)  y 

(12)  y = ctgx, 


= loge  x,$L  = — 
dx  x 


log  Xj  = M . — , where  M 
ax  x 

dy 

sin  x,  -p-  = cos  x 
dx 


COS  X, 

tg  X, 


dy 

dx 

dy 


= — sin  x 

1 


dx  cos2  x 
dv  1 


(13)  y = arc  sin  x, 


dx  sin2  x 

dy 1 


(14)  y = arc  cos  x, 


dx  V 1 — x2 

dy  _ 1 

dx  V5 6 7 1 — 


x^ 


(15)  y = arctgx, 

(16)  y = arc  ctgx, 


dy  _ 1 


dx  1 + x2 

dy  _ 1 

dx  1 + xJ 


Simple  Integrals 
w f a dx  = ax  + C 

J'*  /7TW  ^ 

axn  dx  = + C 

» + 1 

(3)  j*  c*  dx  = ex  -j-  C 

(4)  f—  dx  = loge  x + C 
J x 

(5)  ( ax  dx  = — b C 

J lo ge  a 

(6)  ^ sin  x dx  = — cos  x -f  C 

(7)  cos  x dx  = sin  x + C 
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(8)  / a cos  x dx  = a . sin  x + C 

(9)  f — L_  dx  - tg  X + C 
J cos2x 

(10)  ( -J—  dx  = - ctg  x + C 
J sm2x 

(11)  ( — ; ■■■--  dx  — arc  sin  x + C 

' >!Vl-x2 

= — arc  cos  x -f  C 

(12)  j — - — dx  = arc  tg  x + C 
J 1 + x2 

= — arc  ctg  x + C' 

Constants 

log. 

Base  of  Napierian  logarithms e=  2.7182818  0.4342945 

Log.  e = Modulus  of  common  logarithms.  .....  M = 0.4342945  9.6377843 

Radius  reduced  to  seconds 206264.8  5.3144251 

Radius  reduced  to  minutes 3437 . 7468  3 . 5362739 

Radius  reduced  to  degrees 57.29578  1.7581226 

360  degrees  expressed  in  seconds.  . 1296000  6. 1126050 

360  degrees  expressed  in  minutes 21600  4.3344538 

360  degrees  expressed  in  degrees 360  2.5563025 

Diameter  1,  circumference tt  = 3.14159265  0.4971499 

— = 0.3183099  9.5028501 

TC 

^ - 9.8696044  0.9942997 

1.7724539  0.2485749 

3 

\ — 9.9063329 

\ 6 


- 10 


- 10 
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TABLES  FOR  ESTIMATING  THE  SIGNIFICANCE  OF 

DEVIATIONS 

TABLE  A 

Showing  the  Probability  of  Occurrence  of  Statistical  Deviations  of  Dif- 
ferent Magnitudes  Relative  to  the  Probable  Error 


Deviation 

P.  E. 

Probable  oc- 
currence of 
a deviation 
as  great  as 
or  greater 
than  desig- 
nated one  in 
100  trials. 

Odds  against 
the  occur- 
rence of  a 
deviation  as 
great  as  or 
greater  than 
the  desig- 
nated one. 

Deviation 

P.  E. 

1 

.0  . . . . 

50 

.00 

1 

.00 

to 

1 

3 

.3  . . . 

1 

.1  . . . . 

45 

.81 

1 

.18 

to 

1 

3 

A ..  . 

1 

.2  .... 

41 

.83 

1 

.39 

to 

1 

3 

1 

.3  .... 

38 

.06 

1 

.63 

to 

1 

5 . . . 

1 

A .... 

34 

.50 

1 

.90 

to 

1 

3 

3 

. 6 . . . 
7 . 

1 

5 .... 

31 

.17 

2 

.21 

to 

1 

3 

.8  . . . 

1 

.6  .... 

28 

.05 

2 

.57 

to 

1 

3 

.9  . . . 

1 

.7  .... 

25 

.15 

2 

.98 

to 

1 

A 

n 

1 

.S  .... 

22 

.47 

3 

.45 

to 

1 

A 

U . . . 

1 

1 

9 .... 

20 

.00 

4 

.00 

to 

1 

4 

1 . . . 
.2  ..  . 

2 

0 .... 

17 

.73 

4 

64 

to 

1 

4 

A 

n 

O . . . 
A 

2 

1 ...  . 

15 

67 

5 

38 

to 

1 

H . . . 

2 

2 .... 

13 

78 

6 

25 

to 

1 

4 

5 . . . 

2 

3 .... 

12 

08 

7 

28 

to 

1 

4 

6 . . . 

2 

4 .... 

10 

55 

8 

48 

to 

1 

4 

7 ... 

4 

8 . . . 

2. 

5 .... 

9 

18 

9 

90 

to 

1 

4 

9 . . . 

2 

6 .... 

7 

95 

11 

58 

to 

1 

2 

7 .... 

6 

86 

13 

58 

to 

1 

5 

0 . . . 

2. 

% .... 

5 

89 

15 

96 

to 

1 

6. 

0 . . . 

2 

9 .... 

5 

05 

18 

82 

to 

1 

7. 

0 . . . 

8 

0 . . . 

3. 

0 .... 

4 

30 

22. 

24 

to 

1 

9 

0 . . . 

3. 

1 .... 

3. 

65 

26 

37 

to 

1 

3. 

2 ...  . 

3. 

09 

31 

36 

to 

1 

10. 

0 . . . 

Probable  occurrence 
of  a deviation  as 
great  as  or  greater 
than  designated 
one  in  100  trials. 

Odds  against  the  occurrence 
of  a deviation  as  great  as 
or  greater  than  the  desig- 
nated one. 

2.60 

37.42  to  1 

2.18 

44.80  to  1 

1.82 

53 . 82  to  1 

1.52 

64.89  to  1 

4 .26 

78.53  to  1 

1.04 

95.38  to  1 

.853 

116.3  to  1 

.698 

142.3  to  1 

.569 

174.9  to  1 

.461 

215.8  to  1 

.373 

267.2  to  1 

.300 

332.4  to  1 

.240 

415.0  to  1 

.192 

520.4  to  1 

.152 

655.3  to  1 

.121 

828 .3  to  1 

. 0950 

1,052.  to  1 

.0745 

1,341.  to  1 

.0052 

19,300.  to  1 

. 00023 

427,000.  to  1 

.0000068 

14,700,000.  to  1 

.00000013 

730,000,000.  to  1 

.0000000015 

65,000,000,000.  to  1 
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TABLE  B 

Showing  the  Probability  of  Occurrence  of  Statistical  Deviations  of  Dif- 
ferent Magnitudes  Relative  to  the  Standard  Deviation 


Deviation 

Probable  oc- 
currence of 
a deviation 
as  great  as 
or  greater 
than  desig- 
nated one  in 
100  trials. 

Odds  against 
the  occur- 
rence of  a 
deviation  as 
great  as  or 
greater  than 
the  desig- 

nated one. 

Deviation 

Probable  occurrence 
of  a deviation  as 
great  as  or  greater 
than  designated 
one  in  100  trials. 

Odds  against  the  occurrence 
of  a deviation  as  great  as 
or  greater  than  the  desig- 
nated one. 

a 

a 

0.67449 

50.00 

1 . 00  to  1 

2.7 

.693 

143.2  to  1 

0.7 

48.39 

1 . 07  to  1 

2.8 

.511 

194.7  to  1 

0.8 

42.37 

1 . 36  to  1 

2.9 

.373 

267.0  to  1 

0.9 

36.81 

1 . 72  to  1 

3.0 

.270 

369 . 4 to  1 

1.0 

31.73 

2.15  to  1 

3. 1 

.194 

515.7  to  1 

1.1 

27.13 

2 . 69  to  1 

3.2 

.137 

726.7  to  1 

1.2 

23.01 

3.35  to  1 

3.3 

.0967 

1,033  to  1 

1.3 

19.36 

4.17  to  1 

3.4 

.0674 

1,483  to  1 

1.4 

16.15 

5 . 19  to  1 

3.5 

.0465 

2,149  to  1 

1.5 

13.36 

6.48  to  1 

3.6 

.0318 

3,142  to  1 

1.6 

10.96 

8.12  to  1 

3.7 

.0216 

4,637  to  1 

1.7 

8.91 

10.22  to  1 

3.8 

.0145 

6,915  to  1 

1.8 

7.19 

12 . 92  to  1 

3.9 

.00962 

10,390  to  1 

1.9 

5.74 

16.41  to  1 

4.0 

. 00634 

15,770  to  1 

2.0 

4.55 

20.98  to  1 

5.0 

.0000573 

1,744,000  to  1 

2.1 

3.57 

26.99  to  1 

6.0 

. 00000020 

500,000,000  to  1 

2.2 

2.78 

34.96  to  1 

7.0 

.00000000026 

400,000,000,000  to  1 

2.3 

2.14 

45.62  to  1 

2.4 

1.64 

60.00  to  1 

2.5 

1.24 

79.52  to  1 

2.6 

.932 

106.3  to  1 

APPENDIX  IV 


TABLE  OF  AREAS  AND  ORDINATES  OF  THE  NORMAL  CURVE 


X/<T. 

Area  from 
middle  of 
curve 

(. x/<T  = 0) 
to  indicated 
X/<T. 

Ordinate  at 

X/(T. 

x/cr. 

Area  from 
middle  of 
curve 
{x/<j  = 0) 
to  indicated 

X/(T. 

Ordinate  at 

X/(T. 

.00 . 

.0000 

.3989 

.35 . . . . 

. 1368 

.3752 

.01 

.0040 

.3989 

.36. . 

.1406 

.3739 

.02 

.0080 

.3989 

.37.  . 

. 1443 

.3725 

.03 

.0120 

.3988 

.38 . . 

.1480 

.3712 

.04 

.0160 

.3986 

.39 ..... 

.1517 

.3697 

.05.  . . . 

.0199 

.3984 

.40. ........ 

.1554 

.3683 

.06 

.0239 

.3982 

.41 

.1591 

.3668 

.07 

0279 

. 3980 

.42 

.1628 

.3653 

.08 

.0319 

.3977 

.43 

.1664 

.3637 

.09 

.0359 

.3973 

.44 

,1700 

.3621 

.10.  . . . . 

. 0398 

.3970 

.45 . . . 

.1736 

.3605 

.11 

.0438 

.3965 

.46 

.1772 

.3589 

.12 

.0478 

.3961 

.47 

,1808 

.3572 

.13 

.0517 

.3956 

.48 

.1844 

.3555 

.14 

.0557 

.3951 

.49 

.1879 

. 3538 

.15 

.0596 

.3945 

.50 . . . . 

.1915 

.3521 

.16 

.0636 

.3939 

.51, ....... . 

.1950 

.3503 

,17.  . . 

.0675 

.3932 

.52 

. 1985 

.3485 

.18 

.0714 

.3925 

.53 

.2019 

.3467 

.19 

.0753 

.3918 

.54 

.2054 

.3448 

.20.  . 

.0793 

.3910 

.55 

.2088 

.3429 

.21 

.0832 

.3902 

.56 

.2123 

.3410 

.22 

.0871 

.3894 

.57 

.2157 

.3391 

.23 

.0910 

.3885 

.58 

.2190 

.3372 

.24 

.0948 

.3876 

.59 

.2224 

.3352 

.25 

.0987 

.3867 

.60 

.2257 

.3332 

.26 V . 

.1026 

.3857 

.61 

.2291 

.3312 

.27 

. 1064 

.3847 

.62 ; 

.2324 

.3292 

.28 

.1103 

. 3836 

.63 

. 2357 

.3271 

. 29 . . . . 

1141 

.3825 

.64 

. 2389 

.3251 

.30  

.1179 

.3814 

.65 

.2422 

.3230 

.31 

.1217 

.3802 

.66 

.2454 

.3209 

.32  

.1255 

. 3790 

.67 

.2486 

.3187 

.33  

.1293 

.3778 

.68 

.2517 

.3166 

.34  

.1331 

.3765 

.69 

.2549 

.3144 

440 


TABLE  OF  AREAS  AND  ORDINATES  OF  NORMAL  CURVE  44 1 


AREAS  AND  ORDINATES  OF  THE  NORMAL  CURVE  ( Continued ) 


x/cr. 

Area  from 
middle  of 
curve 
(x/cr  = 0) 

to  indicated 
x/cr. 

Ordinate  at 

X/(T. 

/ 

x/cr. 

Area  from 
middle  of 
curve 
(x/cr  = 0) 
to  indicated 
x/cr. 

Ordinate  at 
x/cr. 

.70 

.2580 

.3123 

1.10 

.3643 

.2179 

.71 

.2611 

.3101 

1.11. 

.3665 

.2155 

.72 

.2642 

3079 

1.12 

.3686 

.2131 

.73 

. 2673 

.3056 

1.13 

.3708 

.2107 

.74 

.2703 

.3034 

1.14 .. 

.3729 

.2083 

.75 

.2734 

3011 

1.15 

.3749 

.2059 

.76 . 

.2764 

.2989 

1.16 

.3770 

. 2036 

.77 

.2794 

.2966 

1.17 

.3790 

.2012 

.78. ........ . 

.2823 

.2943 

1.18 . 

.3810 

.1989 

.79 

.2852 

.2920 

1.19 

.3830 

.1965 

.80 

.2881 

.2897 

1.20 

.3849 

.1942 

.81 . . 

.2910 

.2874 

1.21 

.3869 

.1919 

.82 

.2939 

.2850 

1.22.  

.3888 

1895 

.83 

.2967 

.2827 

1.23 

.3907 

.1872 

.84 

.2995 

.2803 

1.24 

.3925 

.1849 

.85 

. 3023 

.2780 

1.25 

. 3944 

.1826 

.86 

.3051 

.2756 

1.26 

.3962 

1804 

.87 

.3078 

.2732 

1.27 

. 3980 

.1781 

.88 

.3106 

.2709 

1.28 

.3997 

.1758 

.89 

.3133 

. 2685 

1.29.  ...  . 

.4015 

.1736 

.90 . . 

.3159 

.2661 

1.30... 

.4032 

1714 

.91 

.3186 

.2637 

1.31 . 

.4049 

.1691 

.92 

.3212 

.2613 

1.32.... 

.4066 

.1669 

.93 

.3238 

.2589 

1.33.... 

.4082 

.1647 

.94 

.3264 

.2565 

1.34. . . 

.4099 

.1626 

.95 

.3289 

.2541 

1.35 

.4115 

.1604 

.96 

.3315 

.2516 

1.36 . 

.4131 

.1582 

.97 

.3340 

.2492 

1.37 

.4147 

1561 

.98 

.3365 

.2468 

138..  . . 

.4162 

.1539 

.99 

.3389 

. 2444 

1 39 . . . 

.4177 

.1518 

1 00 

.3413 

.2420 

1.40.  . . 

.4192 

.1497 

1.01 

.3438 

. 2396 

1.41. . 

.4207 

.1476 

1.02 

.3461 

.2371 

1.42.... 

.4222 

.1456 

1.03 

. 3485 

.2347 

1.43.  . . 

.4236 

. 1435 

1.04 

.3508 

.2323 

1.44.. 

.4251 

.1415 

1.05 

.3531 

. 2299 

1.45.  . . . 

.4265 

.1394 

1.06 

.3554 

.2275 

1 .46.  . . 

.4279 

.1374 

1.07 

.3577 

.2251 

1 47 

4292 

.1354 

1.08 

.3599 

.2227 

1.48... 

.4306 

.1334 

1.09 

.3621 

.2203 

1.49 

.4319 

.1315 
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AREAS  AND  ORDINATES  OF  THE  NORMAL  CURVE  ( Continued ) 


x/cr. 

Area  from 
middle  of 
curve 
(x/<r  — 0) 
to  indicated 
x/cr. 

Ordinate  at 
x/cr. 

x/cr. 

Area  from 
middle  of 
curve 

( x/cr  = 0) 

to  indicated 

X/CT. 

Ordinate  at 
x/cj. 

1.50 

.4332 

.1295 

1.90 

.4713 

.0656 

1.51 

.4345 

.1276 

1.91 

.4719 

0644 

1.52 

.4357 

.1257 

1.92 

.4726 

.0632 

1.53 

.4370 

.1238 

1.93 

.4732 

.0620 

1.54 

.4382 

.1219 

1.94 

.4738 

.0608 

1.55 

. 4394 

.1200 

1.95 

.4744 

.0596 

1.56 

.4406 

.1182 

1.96 

.4750 

.0584 

1.57 

.4418 

.1163 

1.97 

.4756 

. 0573 

1.58 

.4429 

.1145 

1.98 

.4761 

.0562 

1.59 

.4441 

.1127 

1.99 

.4767 

.0551 

1.60 

.4452 

.1109 

2.00 

.4772 

.0540 

1.61 

.4463 

.1092 

2.01 

.4778 

.0529 

1.62 

.4474 

.1074 

2.02 

.4783 

.0519 

1.63 

.4484 

.1057 

2.03 

.4788 

.0508 

1.64 

.4495 

.1040 

2.04 

.4793 

.0498 

1.65 

.4505 

.1023 

2.05 

.4798 

.0488 

1.66 

.4515 

.1006 

2 06 

.4803 

.0478 

1.67 

.4525 

.0989 

2.07 

.4808 

.0468 

1.68 

.4535 

.0973 

2.08 

.4812 

.0459 

1.69 

.4545 

.0957 

2.09 

.4817 

.0449 

1.70 

.4554 

.0940 

2.10  

.4821 

.0440 

1.71 

.4564 

.0925 

2.11 

.4826 

.0431 

1.72. ..  . 

.4573 

.0909 

2.12 

.4830 

.0422 

1.73. ..  . 

.4582 

. 0893 

2.13 

.4834 

.0413 

1.74 

.4591 

.0878 

2.14 

.4838 

.0404 

1.75.  . 

.4599 

.0863 

2.15 

.4842 

.0395 

1.76 

.4608 

.0848 

2.16 

.4846 

.0387 

1.77 

.4616 

.0833 

2.17 

.4850 

.0379 

1.78 

.4625 

.0818 

2.18 

.4854 

.0371 

1.79 

.4633 

.0804 

2.19 

.4857 

.0363 

1.80 

.4641 

.0790 

2.20 

.4861 

.0355 

1.81 

.4649 

.0775 

2.21 

.4864 

.0347 

1.82 

. 4656 

.0761 

2.22 

.4868 

' .0339 

1.83 

.4664 

.0748 

2.23 

.4871 

. 0332 

1.84 

.4671 

.0734 

2.24 

.4875 

0325 

1.85 

.4678 

.0721 

2.25 

.4878 

.0317 

1.86 

.4686 

.0707 

2.26 

.4881 

.0310 

1.87 

.4693 

.0694 

2.27 

.4884 

.0303 

1.88 

.4699 

.0681 

2.28 

.4887 

, .0297 

1 89 

.4706 

.0669 

2.29 

.4890 

.0290 
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AREAS  AND  ORDINATES  OF  THE  NORMAL  CURVE  ( Continued ) 


x/cr. 

Area  from 
middle  of 
curve 
(x/c T — 0) 
to  indicated 
x/cr. 

Ordinate  at 

X/cr. 

2.30 

.4893 

.0283 

2.31 

.4896 

.0277 

2.32 

.4898 

.0270 

2.33 

.4901 

.0264 

2.34 

.4904 

.0258 

2.35 

.4906 

.0252 

2.36 

.4909 

.0246 

2.37 

.4911 

.0241 

2.38 

.4913 

. 0235 

2.39 

.4916 

.0229 

2.40 

.4918 

.0224 

2.41 

.4920 

.0219 

2.42 

.4922 

.0213 

2.43 

.4925 

.0208 

2.4-4 

.4927 

. 0203 

2.45 

.4929 

.0198 

2.46 

.4931 

.0194 

2.47 

.4932 

.0189 

2.48 

.4934 

.0184 

2.49 

.4936 

.0180 

2.50 

.4938 

.0175 

2.51 

.4940 

.0171 

2.52 

.4941 

.0167 

2.53 

.4943 

.0163 

2.54 

.4945 

.0158 

2.55 

.4946 

.0154 

2.56 

.4948 

.0151 

2.57 

.4949 

.0147 

2.58 

.4951 

.0143 

2.59 

.4952 

.0139 

2.60 

.4953 

.0136 

2.61 

.4955 

.0132 

2.62 

.4956 

.0129 

2.63 

.4957 

.0126 

2.64 

.4959 

0122 

2.65 

.4960 

.0119 

2.66 

.4961 

.0116 

2.67...  . 

.4962 

.0113 

2.68 

4963 

.0110 

2.69 

.4964 

.0107 

x/  cr. 

Area  from 
middle  of 
curve 
(x/cr  = 0) 
to  indicated 
x/cr. 

Ordinate  at 
x/cr. 

2.70 

.4965 

.0104 

2.71 

.4966 

.0101 

2.72 

.4967 

.0099 

2.73 

.4968 

.0096 

2.74 

.4969 

.0093 

2.75 

.4970 

.0091 

2.76 

.4971 

.0088 

2.77 

.4972 

.0086 

2.78 

.4973 

.0084 

2.79 

.4974 

.0081 

2.80 

.4974 

.0079 

2.81 

.4975 

.0077 

2.82 

.4976 

.0075 

2.83 

.4977 

.0073 

2.84 

.4977 

.0071 

2.85 

.4978 

.0069 

2.86 

.4979 

.0067 

2.87 

.4979 

.0065 

2.88 

.4980 

.0063 

2.89 

.4981 

.0061 

2.90 

.4981 

.0060 

2.91 

.4982 

.0058 

2.92 

.4982 

.0056 

2.93 

.4983 

.0055 

2.94 

.4984 

.0053 

2.95 

.4984 

.0051 

2.96 

.4985 

.0050 

2.97 

.4985 

.0048 

2.98 

.4986 

.0047 

2.99 

.4986 

.0046 

3.00 

.4987 

.0044 

3.01 

.4987 

.0043 

3.02 

.4987 

.0042 

3.03 

.4988 

.0040 

3.04.  .'. 

.4988 

0039 

3.05 

.4989 

0038 

3.06 

.4989 

.0037 

3.07 

.4989 

. 0036 

3.08 

.4990 

. 0035 

3.09 

.4990 

0034 
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AREAS  AND  ORDINATES  OF  THE  NORMAL  CURVE  (< Continued ) 


X/(T. 

Area  from 
middle  of 
curve 
(, x/<j  = 0) 
to  indicated 
x/a. 

Ordinate  at 

X/<T. 

Xl<T. 

Area  from 
middle  of 
curve 
(x/a  = 0) 
to  indicated 
x/a. 

Ordinate  at 

X/l T. 

3.10.  ........  . 

.4990 

.0033 

3.50 

.4998 

.0009 

3.11.  ........ . 

.4991 

.0032 

3.51.  ........ 

.4998 

.0008 

3.12 

.4991 

.0031 

3.52 

.4998 

.0008 

3.13.  . 

.4991 

.0030 

3.53 

.4998 

.0008 

3.14  

.4992 

.0029 

3.54 

.4998 

.0008 

3.15.  . . 

.4992 

.0028 

3.55 

.4998 

.0007 

3.16. ...  

.4992 

.0027 

3.56 

.4998 

.0007 

3.17. 

.4992 

. 0026 

3.57......... 

.4998 

.0007 

3.18 . . . . . 

.4993 

.0025 

3.58 

.4998 

.0007 

3.19.  ........  . 

.4993 

.0025 

3.59 

.4998 

0006 

3.20. .....  . . 

.4993 

.0024 

3.60 . . 

.4998 

.0006 

3.21.  ......... 

.4993 

.0023 

3.61. ........ 

.4998 

.0006 

3.22 

.4994 

.0022 

3.62 

.4999 

.0006 

3.23. 

.4994 

.0022 

3.63 

.4999 

.0005 

3.24.  ......... 

.4994 

.0021 

3.64 . . 

.4999 

.0005 

3.25 

.4994 

.0020 

3.65......... 

.4999 

.0005 

3.26. . 

.4994 

.0020 

3.66 

.4999 

.0005 

3.27 

.4995 

.0019 

3.67... 

.4999 

0005 

3.28... 

.4995 

.0018 

3.68 

.4999 

.0005 

3.29 

.4995 

.0018 

3.69 . 

.4999 

.0004 

3.30.....  . . 

.4995 

.0017 

3.70 

.4999 

.0004 

3.31.  ...  . 

.4995 

.0017 

3.71.  . 

.4999 

.0004 

3.32.  . 

.4995 

.0016 

3.72 

.4999 

.0004 

3.33 

.4996 

.0016 

3.73 

.4999 

.0004 

3.34 

.4996 

.0015 

3.74 

.4999 

.0004 

3.35 

.4996 

.0015 

3.75.  

.4999 

.0004 

3.36 

.4996 

.0014 

3.76 

.4999 

.0003 

3.37 

.4996 

.0014 

3.77 

.4999 

0003 

3.38 

.4996 

.0013 

3.78 

.4999 

.0003 

3.39 

.4997 

.0013 

3.79 

.4999 

.0003 

3.40 . . 

.4997 

.0012 

3.80 

.4999 

.0003 

3.41.  . . . 

.4997 

.0012 

3.81 . 

.4999 

.0003 

3.42 

.4997 

.0012 

3.82 

.4999 

.0003 

3.43 

.4997 

.0011 

3.83 

.4999 

. 0003 

3.44 

.4997 

.0011 

3.84 

.4999 

.0003 

3.45 

.4997 

.0010 

3.85 

.4999 

. 0002 

3.46 

.4997 

.0010 

3.86 

.4999 

.0002 

3.47 

.4997 

.0010 

3.87 

.4999 

.0002 

3.48 

.4997 

.0009 

3.88 

.4999 

.0002 

3.49 

.4998 

.0009 

3.89 

.4999 

.0002 
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AREAS  AND  ORDINATES  OF  THE  NORMAL  CURVE  {Concluded) 


x/cr. 

Area  from 
middle  of 
curve 
(x/<T  = 0) 
to  indicated 
x/cr. 

Ordinate  at 
x/cr. 

x/cr. 

Area  from 
middle  of 
curve 
( x/cr  = 0) 
to  indicated 
x/cr. 

Ordinate  at 
x/<r. 

3,90 

,5000 

.0002 

4.10. .......  . 

. 5000 

.0001 

3.91 

.5000 

.0002 

4.11......... 

. 5000 

.0001 

3.92 

.5000 

,0002 

4.12 

. 5000 

.0001 

3.93 

5000 

.0002 

4.13 . 

.5000 

.0001 

3.94 

.5000 

.0002 

4.14 

.5000 

.0001 

3.95 

5000 

.0002 

4.15 

. 5000 

.0001 

3.96 

.5000 

.0002 

4.16 

.5000 

.0001 

3.97 

. 5000 

.0002 

4.17.  ..  

5000 

.0001 

3.98 

.5000 

.0001 

4.18 

.5000 

.0001 

3.99 

.5000 

.0001 

4.19 

.5000 

.0001 

4.00 

. 5000 

.0001 

4.20. ........ 

.5000 

.0001 

4.01 

.5000 

.0001 

4.21 

.5000 

.0001 

4.02 

.5000 

.0001 

4.22. 

.5000 

.0001 

4.03 

.5000 

.0001 

4.23 

.5000 

.0001 

4.04 

.5000 

.0001 

4.24 . 

.5000 

.0000 

4.05 

. 5000 

.0001 

4.06 

.5000 

.0001 

4.07 

.5000 

.0001 

4.08 

.5000 

.0001 

4.09 

.5000 

.0001 

APPENDIX  V 


SUMS  OF  LOGARITHMS 


Table  of  the  Sums  of  the  Logarithms  of  the  Natural  Numbers  from  1 to  100 


X. 

S (log  x). 

5 (x  log  x). 

S (log  x)2. 

1. 

0.0000000 

0.0000000 

0.0000000 

2 

0.3010300 

0 . 6020600 

0.0906191 

3 

0.7781513 

2.0334238 

0.3182638 

4 

1.3802112 

4.4416637 

0.6807400 

5 

2 :0791812 

7.9365137 

1.1692991 

6 

2.8573325 

12.6054212 

1.7748184 

7 

3 . 7024305 

18.5211075 

2.4890091 

8 

4.6055205 

25.7458274 

3.3045806 

9 

5.5597630 

34.3340100 

4.2151594 

10 

6.5597630 

44.3340100 

5.2151594 

11..  

7.6011557 

55.7893295 

6.2996581 

12..  . . 

8 . 6803370 

68.7395045 

7.4642903 

13 

9 . 7942803 

83.2207681 

8.7051601 

14.. 

10.9404084 

99 . 2665606 

10.0187696 

15.. 

12.1164996 

116.9079295 

11.4019602 

16 

13.3206196 

136.1738492 

12.8518651 

17 

14.5510685 

157.0914808 

14.3658697 

18 

15.8063410 

179.6863859 

15.9415788 

19 

17.0850946 

203.9827044 

17.5767895 

20 ,l 

18.3861246 

230.0033043 

19 . 2694686 

21 

19 . 7083439 

257 . 7699095 

21.0177324 

22 

21.0507666 

287.3032084 

22.8198311 

23 

22.4124944 

318.6229487 

24.6741338 

24 

23.7927057 

351.7480185 

26.5791169 

25 

25 . 1906457 

386.6965187 

28.5333531 

26 

26.6056190 

423.4858257 

30.5355027 

27 

28.0369828 

462 . 1326474 

32 . 5843049 

28 

29.4841408 

502.6530722 

34.6785713 

29 

30.9465388 

545.0626142 

36.8171792 

30 

32.4236601 

589.3762518 

38.9990664 

31 

33.9150218 

635 . 6084643 

41.2232261 

32 

35.4201717 

683.7732636 

43.4887026 

33 

36.9386857 

733.8842237 

45.7945871 

34 

38.4701646 

785.9545068 

48 . 1400148 

35 

40.0142326 

839.9968884 

50.5241609 
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SUMS  OF  LOGARITHMS  ( Continued ) 


X. 

S (log  x). 

5 (x  log  x). 

5 (log  x)2. 

36..  ......  

41.5705351 

896.0237784 

52.9462384 

37 . 

43.1387369 

954.0472422 

55.4054951 

38.. . . 

44.7185205 

1,014.0790189 

57.9012113 

39.. 

46.3095851 

1,076.1305385 

60.4326979 

40 . . 

47.9116451 

1,140.2129382 

62.9992941 

41..  .......  

49 . 5244289 

1,206.3370763 

65.6003659 

42 

51.1476782 

1,274.5135465 

68 . 2353041 

43 

52.7811467 

1,344.7526901 

70.9035233 

44 

54.4245993 

1,417.0646079 

73 . 6044600 

45 

56.0778119 

1,491.4591710 

76.3375716 

46 

57.7405697 

1,567.9460313 

79.1023352 

47 

59.4126676 

1,646.5346306 

81.8982465 

48 

61.0939088 

1,727.2342100 

84.7248186 

49 

62.7841049 

1,810.0538179 

87.5815814 

50 

64.4830749 

1,895.0023181 

90.4680804 

51 

66 . 1906450 

1,982.0883971 

93.3838763 

h.... 

67.9066484 

2,071.3205710 

96.3285438 

53 

69 . 6309243 

2,162 . 7071920 

99.3016711 

54 

71.3633180 

2,256.2564551 

102.3028592 

55 

73 . 1036807 

2,351.9764030 

105.3317215 

56.. 

74.8518687 

2,449 .8749325 

108.3878829 

57 

76.6077436 

2,549.9597993 

111.4709794 

58 

78.3711716 

2,652.2386229 

114.5806577 

59.. 

80.1420236 

2,756.7188916 

117.7165745 

60 

81.9201748 

2,863.4079666 

120.8783964 

61 ......  

83 . 7055047 

2,972.3130866 

124.0657990 

62 

85.4978964 

3,083.4413713 

127.2784670 

63 

87.2972369 

3,196.7998259 

130.5160934 

64 . 

89.1034169 

3,312.3953443 

133.7783793 

65 

90.9163303 

3,430.2347124 

137.0650341 

66 

92 . 7358742 

3,550.3246122 

140.3757742 

67 

94.5619490 

3,672.6716240 

143.7103234 

68 

96.3944579 

3,797.2822300 

147.0684123 

69 

98.2333070 

3,924.1628173 

150.4497786 

70 

100.0784050 

4,053.3196801 

153.8541654 

71 

101.9296634 

4,184.7590228 

157.2813229 

72 

103.7869959 

4,318.4869626 

160.7310069 

73 

105.6503187 

4,454.5095314 

164.2029790 

74 

107.5195505 

4,592.8326786 

167.6970062 

75 

109.3946117 

4,733.4622734 

171.2128609 

76 

111.2754253 

4,876.4041064 

174.7503207 

77 

113.1619160 

5,021.6638922 

178.3091670 

78 

115.0540106 

5,169.2472713 

181.8891890 

79 

116.9516377 

5,319.1598115 

185.4901776 

80 

118.8547277 

5,471.4070104 

189.1119291 
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SUMS  OF  LOGARITHMS  ( Concluded ) 


X. 

3 (log  *). 

S (x  log  x). 

S (log  X )2. 

81 

120.7632127 

5,625.9942970 

192.7542442 

82 

122.6770266 

5,782.9270329 

196.4169276 

83 

124.5961047 

5.942.2105145 

200.0997884 

84 

126.5203840 

6,103.8499746 

203.8026391 

85 

128.4498029 

6,267.8505832 

207.5252965 

86 

130.3843013 

6,434.2174510 

211.2675808 

87 

132.3238206 

6,602.9556260 

215.0293157 

88 

134.2683033 

6,774.0701012 

218.8103286 

89 

136.2176933 

6,947.5658118 

222.6104500 

90 

138.1719358 

7,123.4476376 

226.4295137 

91 

140.1309772 

7,301.7204043 

230.2673568 

92 

142.0947650 

7,482.3888844 

234.1238194 

93 

144.0632480 

7,665.4577986 

237.9987445 

94 

146.0363758 

7,850.9318169 

241.8919781 

95 

148.0140994 

8,038.8155594 

245.8033687 

96 

149.9963707 

8,229.1135977 

249.7327590 

97 

151.9831424 

8,421.8304560 

253.6800209 

98 

153.9743685 

8,616.9706114 

257.6450022 

99 

155.9700037 

8,814.5384957 

261.6275620 

100 

157.9700037 

9,014.5384957 

265.6275620 
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Abridged  list  of  causes  of  death,  91,  92 
Abscissa  defined,  164 
Abscissse  of  binomial,  309,  310 
Accounting  machine,  155,  156 
Accuracy  in  making  records,  123,  124 
Adaptation,  purposeful,  of  records,  129 
Addition  nomograms,  192,  193 
Age  as  percentage  deviation  from  mean 
duration  of  life,  252,  254-259 
at  marriage,  66,  279-282, 
errors  in  censuses,  68-71 
index  of  population,  403 
limits  of  fertility,  223 
mean,  at  death,  240,  278,  279 
Agriolimax,  256-258 
Aids  to  biometric  workers,  429 
Altruism  in  making  records,  124 
Ambiguity,  130 

American  Public  Health  Association,  104 
Statistical  Association,  44 
Amsterdam,  population  of,  211 
Anderson,  D.  D.,  428 
Angular  co-ordinates,  165 
Approximations  to  factorial  n,  299 
Area,  land,  of  United  States,  170 
Areas  of  normal  curve,  440-445 
Arithlog  scale,  183-185,  247 
Arithmetic  scale,  182,  183 
Arm  length,  348,  349 
Arosonius,  E.,  43 
Arrangement  of  tables,  116-119 
Array  defined,  376 
Arthur,  W.,  349 
Artificial  feeding  rate,  385 
Auditory  acuity,  347 
Australia,  220,  224,  225,  245,  246,  428 
Austria,  43,  44,  106,  220,  224,  225 
Automobile  life  table,  256-258 
Autopsy  record  form,  161 
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Babst,  E.  D.,  187 
Bacon,  A.  L.,  9,  106,  162 
Baden,  43 
Baines,  A.,  44 
Baker,  O.  E.,  170 
Baltic  Republics,  106 
Baltimore,  23,  36,  114,  180,  217,  278,  279 
Bar  diagrams,  166-169 
Barlow,  P.,  429 
Bavaria,  43 
Beef,  167,  168 
Beeton,  M.,  385 
Belgium,  44,  106,  220,  224,  225 
Belz,  M.  H.,  428 
Berkson,  J.,  427 
Bertillon,  J.,  77  (portrait),  96 
Bill  of  mortality,  43,  45 
oldest,  48-50 
Billings,  J.  S.,  44,  153 
Binomial,  abscissse  of,  309,  310 
illustrated,  305-308 
standard  deviation  of,  309 
terms  of,  303-310,  331-333 
Biology,  relation  of,  to  biometry,  18 
Biometer,  53 

Biometric  ideas  and  methods,  importance 
of,  in  medicine,  22-25 
Biometry  defined,  18,  21 
history  of,  55-61 
Biostatistics  defined,  21 
Birth  certificate,  standard,  73 
control  record  form,  150 
next,  in  Baltimore,  illustration,  36,  37 
Birth-death  ratio,  229-236 
Birth-rates,  crude,  222-225,  385 
specific,  225,  226 
Blakeman,  J.,  391,  392,  393 
Blatta  orientalis,  256-258 
Blood,  nomogram  for,  193-196 
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INDEX 


Blood,  relative  cell  volume,  115,  116,  348, 
350-354 

Blood-pressure  in  old  men,  110,  111 
Body  surface,  nomogram  for,  193,  194 
weight,  115,  116,  347,  348,  350-354, 
385,  390,  391,  410-415 
Bookkeeper-teller  illustration,  22 
Boole,  G.,  53 
Bowley,  A.  L.,  62 

Brain  weight,  316,  348,  375-377,  379-383, 
386-389 
Bravais,  A.,  44 
Breathing  capacity,  347 
Brinton,  W.  C.,  169,  197,  202 
Brodetsky,  S.,  191,  203 
Brown,  J.  W.,  220,  237,  385 
Brown,  L.,  22 

Brownlee,  J.,  51,  61,  277,  287 
Bruhns,  C.,  429 
Brunt,  D.,  408,  416,  425 
Buache,  191 
Buday,  L.  v.,  44 
Bulgaria,  220 

Bureau  of  the  Census,  44,  100,  121,  213, 
236,  237,  402,  403 
Burger,  M.  H.,  236,  237 
Burgess,  R.  W.,  41 

Calculation  of  moments,  337-340 
California,  275,  276 
Canada,  43 
Cancer,  385 
illustration,  102,  103 

Cancerous,  age  at  death  of  parents  of,  176 
Card  forms  for  mechanical  tabulation,  145, 
156-159 

Carr,  G.  S.,  429 
Carriere,  H.,  106 
Case  fatality  rates,  23,  221,  222 
histories,  preservation  of,  151,  152 
history  writing,  130-132 
record  method,  123-159 
Causes  of  death,  abridged  list  of,  91,  92 
intermediate  list  of,  88-91 
international  list  of,  77-95 
joint,  95-102 

reliability  of  statistics  of,  102-105 


Causes  not  equally  significant,  34,  35 
Cell  volume  of  blood,  115,  116,  348,  350- 
353,  365 

Census,  age  errors  in,  68-71 

Bureau  or  the,  44,  100,  121,  213,  236, 
237,  402,  403 
method,  63-71 
Cephalic  index,  385 
Certificate  of  birth,  standard,  73 
of  death,  standard,  74,  75 
of  still-birth,  76 
Ceylon,  220 
Charlier,  C.  V.  L.,  61 
Chart,  ratio,  183-185 
Chest  breadth,  348 
circumference,  348 
Chicago,  428 
Chile,  220,  221 
Chi-square  test,  315-326 
Clark,  H.  C.,  320 
Class  limits,  110-112,  361-365 
Classification,  dichotomous,  107-110,  112— 
114,  120 
linear,  109-112 
of  rates  and  ratios,  206-209 
Clerk-Maxwell,  32 
Code,  disease,  158 
Coefficient  of  correlation,  378-384 
of  regression,  382,  383,  395,  396 
of  variation,  346-356 
probable  error  of,  346 
Cohn,  A.  E.,  24 

Collection  of  scientific  data,  121-123 
Collis,  E.  L.,  277 
Combinations,  297-299 
Commission  for  the  Prevention  of  Tuber- 
culosis in  France,  189 
Complications,  mechanical  tabulation  of, 
157-159 

Compound  variable,  constants  of,  359-361 
Comprehensiveness  of  records,  125 
Concurrent  events,  probability  of,  300-303 
Constants,  437 

measuring  variation,  344-356 
of  a compound  variable,  359-361 
shape,  356-359 
type,  340-344 

Constitutional  factors  in  disease,  133-144 


INDEX 
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Construction  of  life  tables,  262,  263 
Consumption  of  protein  in  United  States, 
167,  168 

Coolidge,  J.  L.,  291,  314 
Co-ordinates,  angular,  165 
polar,  165,  187 
rectangular,  164,  165 
Corn,  classification  of  kernels,  126-129 
Corrected  death-rates,  265,  269-274 
morbidity  rates,  275,  276 
Correction  for  correlation  ratio,  392,  393 
Correlation  coefficient,  378-384 
probable  error  of,  378 
genesis  of,  366-374 
in  man,  385 

measurement  of,  366-393 
partial,  394-406 
ratio,  386-392 
correction  for,  392,  393 
skew,  386-392 
spurious,  360 
table,  115,  116,  375-377 
Course  of  death-rate  from  tuberculosis, 
181-184 

Creighton,  C.,  48,  61 

Crude  death-rates,  209-211 

Crum,  W.  L.,  120 

Cubit  length,  349 

Cummings,  J.,  44 

Curve  fitting,  407-428 

Cyclic  time  trend  diagrams,  185-190 

Czechoslovakia,  106 

Czuber,  E.,  40 


Dana,  W.  F.,  279 
Darbishire,  A.  D.,  366,  393 
Darwin,  C.,  56 
Data,  collection  of,  121-123 
Davenport,  C.  B.,  44 
Davis,  W.  H,  102 
Death  certificate,  standard,  74,  75 
joint  causes  of,  95-102 
ratios,  228,  229 

Death-rates,  corrected,  265,  269-274 
crude,  209-211 

specific,  212-215,  270,  271,  273,  274 
standardized,  265-269 


Defects  in  medical  records,  131,  132 
Definitions,  18-21 
DeMoivre,  A.,  43 
DeMorgan,  A.,  44 
Denmark,  43,  220,  224,  225 
Deparcieux,  43 
Derham,  W.,  43 
Dermal  sensitivity,  347 
Descartes,  R.,  191 
De  Souza,  D.  H.,  385 
Deviations,  probability  of  relative  to  prob- 
able error  and  standard  deviation,  438, 
439 

Diabetes  mellitus,  302,  303 
Diagrams,  bar,  166-169 
cyclic  time  trend,  185-190 
defined,  164 

integral  frequency,  176-180 
‘‘pie,”  169 
types  of,  165,  166 
Dice,  305-308,  366 

Dichotomous  classification,  107-110,  1 12— 
114,  120 

Difference,  probable  error  of,  282,  283 
significant,  283-287 
Differential  coefficient,  435,  436 
Diphtheria,  laryngeal,  205,  206 
Disease  code,  158 
constitutional  factors  in,  133-144 
Division,  430,  431 
D’Ocagne,  191,  203 
Doering,  C.  R.,  256 
Doolittle,  M.  H.,  425 
Double  dichotomous  tables,  112-115 
Drosophila  melanogaster,  252-258 
Dudfield,  R.,  106 
Duncan,  J.  M.,  226,  237 
Dunn,  H.  L.,  145,  156,  162 
Du  Pasquier,  L.  G.,  417,  427 
Duration  of  life,  252-259,  385 

Edge,  P.  G.,  106 
Edgeworth,  F.  Y.,  60,  62,  313 
Effectiveness  of  public  health  work,  227, 
228 

Eggs,  167,  168,  226,  363,  364,  397-400 
Elderton,  W.  P.,  359,  365 
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Ellis,  R.  L.,  408,  416 

Embryo,  weight  and  height  of,  389-392, 
410-415 

Emerson,  H.,  11,  104,  105 
England,  43,  44,  98,  106,  220,  224,  225, 
245,  246 

Enlarged  spleen,  320-322 
Epidemic  jaundice,  117-119,  168,  169 
Equation,  personal,  125-129 
Equations,  normal,  409-416,  424,  425 
Errors,  age,  in  censuses,  68-7 1 
d’Espine,  M.,  78 
Essential  hypertension,  133 
Exclusiveness  in  tabulation,  114,  115 
Expectation  of  life,  43,  240 
Experience  the  basis  of  probability,  288- 
292 

Experimental  method,  394,  395 
Exposed  to  risk,  204-207 


Factorial  n , approximations  to,  299 
Farr,  W 44,  51,  52  (portrait),  55,  62,  77, 
78,  218,  237,  240 
Faure,  F.,  43 
Fawcett,  C.  D.,  385 
Fecundity,  226 
Feldman,  W.  M.,  193,  194 
Femur  length,  348 
Fertility,  226 
age  limits  of,  223 
record  form,  137,  142,  143,  150 
Field,  J.  A.,  184,  203,  250 
Finland,  220 

Fisher,  A.,  60,  229,  237,  314 
Fisher,  I.,  184,  185,  202 
Fisher,  R.  A.,  41,  319,  378 
Fitting  a logarithmic  curve,  413,  414 
a logistic  curve,  420-427 
a parabola,  411,  412 
a straight  line,  411 
Flies,  life  table  for,  252-258 
Foot  length,  349 
Force  of  mortality  defined,  204 
Forearm  length,  348 
Foreign  born,  229-236 
Forms  for  medical  records,  133-145,  150, 
156-159,  161 


Formulae,  429-437 
Forsyth’s  approximation,  299 
Four-fold  table,  317-322 
Fractional  powers,  434 
Fragestellung,  122 

France,  43,  96,  106,  189,  220,  224,  225 
Frechet,  M.,  191,  203 
Frequencies,  probable  error  of,  337 
Frequency,  19,  20,  376 
distribution,  335-337 
polygons,  169,  174-176 

Gall-bladder,  302,  303 
Gall-stones,  131 

Galton,  F.,  18,  44,  55,  56  (portrait),  57, 
58,  175,  313,  376 

Gauss,  K.  F.,  43,  54,  57,  59,  312,  425 
Genesis  of  correlation,  366-374 
Germany,  43,  44,  98,  220,  245,  246 
Glover,  J.  W.,  240,  241,  244,  245,  246,  263, 
266,  299,  403,  429 
Glycosuria,  302,  303 
Godfrey,  E.  H.,  43 
Gover,  M.,  428 
Gram,  J.  P.,  61 

Graphic  representation,  164-203 
of  relative  variability,  349-356 
work,  standards  in,  196-202 
Graunt,  J.,  43,  45-47,  51 
Great  Britain,  43 
Greece,  44 

Greenwood,  M.,  11,  22,  41,  51,  60  (por- 
trait), 62,  218,  220,  229,  237,  252,  256, 
264,  277,  349,  385 
Griffin,  C.  E.,  256 
Grimm,  H.,  11 

Group,  statistical  method  as  description 
of,  28-31 

Growth  of  population,  44,  417-428 
Guillard,  A.,  78 

Hair  color,  323-326 
Halley,  E.,  43,  47  (portrait),  48 
Hamblen,  A.  D.,  184,  185,  203 
Hand  length,  348 
rapidity  of,  347 
steadiness  of,  347 


INDEX 


453 


Hankins,  F.  H.,  61 
Hardman,  R.  P.,  205,  206 
Harmon,  G.  E.,  349 
Hase,  A.,  256 
Haskell,  A.  C.,  202 
Hawkes,  O.  A.  M.,  49 
Hayward,  T.  E.,  263 
Head  height,  171-175,  177-179 
length,  349 

Head-neck  length,  348 
Heart  Committee,  New  York  Tuberculosis 
and  Health  Association,  24 
Heart,  organic  diseases  of,  402-406 
weight,  347,  385 

Height  of  embryo,  389-392,  410-415 
Henderson,  L.  J.,  193-196,  203 
Henderson,  R.,  263 
Heron,  D.,  385 
Hezlet,  R.  K.,  191,  203 
Hill,  L.,  256 
Histograms,  169-174 
History  of  biometry,  55-61 
of  science,  17 
of  vital  statistics,  42-62 
writing,  130-132 
Hoffman,  F.  L.,  77 
Hollerith,  H.,  44,  145,  153,  156,  162 
Holzinger,  K.  J.,  349 
Home,  value  of,  66 
Homicide,  104 
Hooker,  R.  H.,  105 
Hookworm,  188,  227,  326-329 
Hooper,  W.,  41 

Hospital  statistics,  131,  132,  145,  152, 
156-159,  222 

Howard,  W.  T.,  9,  162,  180,  205,  2,36 
Hoyer,  B.,  145,  162 
Hull,  C.  H.,  43,  61 
Humerus  length,  348 
Humphreys,  N.,  44,  51,  53,  62 
Hungary,  44,  106,  220,  224,  225 
Huygens,  C.,  43,  47 
Hydra  fusca,  256-258 
Hypertension,  essential,  133 

Ideals  in  making  of  scientific  records,  1 23— 
130 

Incidence  rates,  227 


Inclusiveness  of  observations,  125,  129, 
130 

Indeterminism  not  implied  in  statistical 
method,  32,  33 

Index,  age,  of  population,  403 
vital,  229-236 
Indexing,  153-159 
India,  44,  245,  246 

Individual,  statistical  method  as  predic- 
tion of,  36-38 

Infant  mortality,  47,  215-221,  344,  385 
rates,  215-221 
Infantile  paralysis,  275,  276 
Influenza  epidemic,  108-110,  402-406 
incidence  among  tuberculous,  108- 
110 

Inheritance,  56 

Institute  for  Biological  Research,  133 
of  Actuaries,  44 

Integral  curve  with  percentage  scale,  179 
frequency  diagrams,  176-180 
Integrals,  436,  437 
Intelligence  quotient,  347 
Interlabral  height,  347 
Intermediate  List  of  Causes  of  Death,  88- 
91 

International  Commission  on  Causes  of 
Death,  recommendations  of,  92,  93 
Health  Board,  188-190 
List  of  Causes  of  Death,  77-95 
Inter-nipple  breadth,  348 
Ireland,  106,  220,  224,  225 
Isserlis,  L.,  60 

Italy,  44,  220,  224,  225,  245,  246 


Jamaica,  220,  221 
Japan,  220,  236,  237 
Jaundice,  epidemic,  117-119,  168,  169 
Jennings,  H.  S.,  34 
Jensen,  A.,  43 
Jeter,  H.  R.,  428 
Johannsen,  A.,  11 
Johns  Hopkins  Hospital,  112 
University,  133 
Joint  causes  of  death,  95-102 
Jones,  D.  C.,  40 
Julin,  A.,  44 


454 


INDEX 


Kaufman,  A.,  44 
Keenness  of  sight,  347 
Kenyon,  F.,  49 
Key  punch,  153 
Kiaer,  A.  N.,  43 
Kidney  weight,  347 
Kilgore,  E.  S.,  41 

Knibbs,  G.  H.,  223,  224,  225,  226,  237 
Knight,  F.  H.,  162 
Koren,  J.,  61 
Kurtosis,  358,  359 
probable  error  of,  358 

Lal,  M.,  385 

Land  area  of  United  States,  170 
Laplace,  P.  S.,  43,  45,  54,  57  (portrait), 
59,  312,  313 

Laryngeal  diphtheria,  205,  206 
League  of  Nations  Health  Organization, 
106 

Least  squares,  method  of,  408-416,  423- 
427 

Le  Blanc,  T.  J.,  11,  236,  237 
Lee,  A.,  349 

Legibility  of  records,  124 
Life,  duration  of,  252-259,  385 
expectation  of,  43,  240 
table,  48,  53,  229,  238-264 
construction  of,  262,  263 
Farr  on,  53 

for  lower  organisms,  252-259 
Halley’s,  48 
nomogram,  246-252 
population,  259-262 

Limit  of  binomial,  normal  curve  as,  310- 
313 

Limitations  of  partial  correlation  method, 
401,  402 

Limits,  class,  110-112,  361-365 
table  of  sampling,  330 
Linear  classification,  109-112 
regression,  376-384 
Litchfield,  H.  R.,  205,  206 
Liver  weight,  347,  348,  385 
Logarithmic  curve,  fitting  of,  413,  414 
scale,  183,  184 
Logarithms,  434 


Logarithms,  sums  of,  446-448 
Logistic  curve,  44,  417-428 
London  Hospital,  23 
Longevity  record  form,  145-149 
Lotka,  A.  j.,  211,  236,  417,  427,  428 
Lottin,  J.,  43,  44,  61 

Lower  organisms,  life  tables  for,  252-259 
Lung  capacity,  347 

MacDonald,  D.,  323 
Macdonnell,  W.  R.,  349,  385 
Maize,  classification  of  kernels,  126-129 
Malaria  parasites,  320-322 
Male  birth,  probability  of,  294,  312,  313 
Man,  correlation  in,  385 
variation  in,  347-349 
Mandible,  348 
Maps,  statistical,  188-190 
Marriage,  age  at,  66,  279-282 
Mathematical  formulae,  429-437 
Matiegka,  H.,  375 
Mauritius,  428 

Mean,  340,  341,  346,  350-356 
age  at  death,  240,  278,  279 
probable  error  of,  341 
Measles,  190,  323-325 
Measurement  of  correlation,  366-393 
of  variation,  335-365 
Mechanical  tabulation,  44,  145,  152-160, 
162,  163 

card  forms  for,  145,  156-159 
of  complications,  157-159 
Median,  341,  342 
probable  error  of,  342 
Medical  records,  defects  in,  131,  132 

forms  for,  133-145,  150,  156-159,  161 
Medicine,  importance  of  biometric  ideas 
and  methods  in,  22-25 
Menzler,  F.  A.  A.,  163 
Mercandin,  44 
Mercier,  C.,  334 
Merz,  T.,  27 

Method  of  least  squares,  408-416,  423-427 
of  studying  therapeutic  problem,  22-24 
Mexicans,  67 
Meyer,  R.,  43,  44 
Miliary  tuberculosis,  112-114 


INDEX 


455 


Milk  production,  20,  21,  41 
Mills,  F.  C.,  40 

Miner,  J.  R.,  9,  11,  41,  115,  236,  349,  354, 
365,  402,  404,  406 
Minnesota,  275,  276 
Mitchell,  A.  G.,  145,  162 
Mode,  342-344 
probable  error  of,  343 
Moments,  calculation  of,  337-340 
Monk,  A.  T.,  428 
Morant,  G.  M.,  349 
Morbidity,  force  of,  204 
mortality  as  measure  of,  103 
nomenclature  of,  94,  95 
rates,  227,  228 
corrected,  275,  276 

Mortality  as  measure  of  morbidity,  103 
bill  of,  43,  45 
force  of,  defined,  204 
infant,  47,  215-221,  344,  385 
rates,  215-221 
oldest  bill  of,  48-50 
urban  vs.  rural,  47,  219,  221 
Mortara,  G.,  40 
Mouse  life  table,  256-258 
Mouth  breadth,  348 
Mule,  illustration,  35 
Multiplication,  429,  430 
Murphy,  T.  F.,  11 
Musselman,  J.  R.,  349 


Nasal  breadth,  lower,  348 
depth,  347,  348 
Natality,  force  of,  204 
Native  born,  230-236 
Nature  of  statistical  world,  27,  32-35 
Negro,  154,  230-232,  327,  328,  344,  347- 
349,  428 

Netherlands,  43,  106,  220,  221,  224,  225 
New  York  City,  186,  187,  217,  428 
State,  117-119,  168,  204,  205,  231 
Tuberculosis  and  Health  Association, 
24 

New  Zealand,  220 
Ney,  M.,  105,  106 
Niceforo,  A.,  41 
Noble,  R.  E.,  77 


Nomenclature  of  morbidity,  94,  95 
Nomograms,  191-196,  246-252 
addition,  192,  193 
for  blood,  193-196 
for  body  surface,  193,  194 
life  table,  246-252 
Non-linear  regression,  386-392 
Normal  curve,  43,  285,  286,  310-313,  316, 
317,  331-333,  343,  357,  358,  440- 
445 

areas  of,  440-445 
as  limit  of  binomial,  310-313 
ordinates  of,  440-445 
equations,  409-416,  424,  425 
Norway,  43 
Nosology,  77 

Occupation,  385 
Ogives,  175,  176,  178-180 
Ogle,  W.,  55,  62 
Oldest  bill  of  mortality,  48-50 
Old  men,  blood  pressure  in,  110,  111 
Oral  temperature,  349 v 385 
Ordinate  defined,  164,  165 
Ordinates  of  normal  curve,  440-445 
Organic  diseases  of  the  heart,  402-406 
Original  registration  states,  181,  241-244, 
260,  266 
Osier,  W.,  23 

Parabola,  fitting  of,  411,  412 
Parents  of  tuberculous  and  cancerous,  age 
at  death  of,  176 
Parker,  S.  L.,  263 
Partial  correlation,  394-406 

method,  limitations  of,  401,  402 
Patton,  A.  C.,  120 

Pearl,  R.,  41,  62,  106,  115,  120,  162,  163, 
166.  176,  226,  236,  237,  246,  256,  263, 

264,  316,  354,  359,  365,  375,  397,  402, 

406,  417,  427,  428 

Pearson,  K.,  9,  44,  58  (portrait),  59,  60, 
62,  191,  314,  315,  317,  319,  322,  323, 
333,  334,  341,  343,  345,  346,  349,  357 

358,  360,  365,  366,  385,  386,  392,  393, 

395,406,  410,  429 


456 


INDEX 


301,  303-305, 


Probable  error  of  median,  342 
of  mode,  343 

of  partial  (net)  correlation  coefficient, 
406 

of  skewness,  357 
of  standard  deviation,  345 
Proportion,  434,  435 

Protein,  consumption  of,  in  United  States, 
167,  168 

Providence,  R.  I.,  266-271,  27 3 
Prudential  Insurance  Company,  186 
Prussia,  43;  220,  224,  225 
Psychology,  17 

Public  Health  Service,  U.  S.,  227 
Public  health  work,  effectiveness  of,  227, 
228 

Puerperal  septicemia,  204,  205 
Pulse  rate,  283,  335-348,  357,  358,  385 
Purpose  of  tabulation,  107 


Peirce,  C.  S.,  313 
Pell,  C.  E.,  229 
Pelvic  diameters,  385 
Penny  tossing,  288-294,  300, 

307-310,  366-374 
Percentage  frequency,  179 
Permanence  of  records,  124,  125 
Permutations,  295-297 
Personal  equation,  125-129 
Petty,  W.,  43,  61 
Philadelphia,  186,  187,  217 
‘Tie”  diagrams,  169 
Pigmentation,  323-325 
Pikler,  J.  J,  96 

Pneumonia,  23,  102,  103,  153-155 
Point  binomial,  303-311,  331-333 
Poisson,  S.,  53,  54 
Polar  co-ordinates,  165,  187 
Poliomyelitis,  275,  276 
Polygons,  frequency,  169,  174-176 
Population,  age  index  of,  403 
growth,  44,  417-428 
life  table,  259-262 
of  Amsterdam,  2,11 
standard,  271-274 
stationary,  259-262 
Portugal,  106 
Poultry,  167,  168 
Poverty  rate,  220,  221,  385 
Powers,  431-433 

Preservation  of  case  histories,  151,  152 
Pro  ales  decipiens,  255-258 
Probability,  experience  basis  of,  288-292 
measure  of,  defined,  292 
of  concurrent  events,  300-303 
of  deviations  relative  to  probable  error, 
438 

of  male  birth,  294,  312,  313 
special  theorems  in,  315-334 
theory  of,  40,  282-287,  288-334 
Probable  error,  18,  38,  39,  278-287,  363, 
406,  438 

of  coefficient  of  variation,  346 
of  correlation  coefficient,  378 
of  difference,  282,  283 
of  frequencies,  337 
of  kurtosis,  358 
of  mean,  341 


Quetelet,  L.  A.  J.,  43,  44,  51,  52  (por- 
trait), 57,  61 

Radius  length,  348 
of  gyration,  345 
Randomness,  290 
Rapidity  of  hand,  347 
Rates,  204-228 
and  ratios,  204-237 

classification  of,  206-209 
birth-,  222-226 
case  fatality,  221,  222 
morbidity,  227,  228 
Ratio  chart,  183-185 
Ratios,  204,  228-236 
birth-death,  229-236 
death,  228,  229 
Rau,  P.,  256 
Reaction  time,  347 

Recommendations  of  International  Com- 
mission on  Causes  of  Death,  92,. 93 
Recorde,  R.,  16 

Records,  scientific,  ideals  in  making  of, 
123-130 

Rectangular  co-ordinates,  164,  165 
Reed,  L.  J.,  9,  11,  246,  263,  359,  397,  417, 
418,  427,  428 


INDEX 


457 


Registrar-General,  44,  99,  223,  264,  272 
Registration  area,  44,  74,  181,  182,  213, 
214,  219,  220,  230,  231,  232,  234, 
294 

method,  71,  72 

states,  original,  181,  241-244,  260,  266 
Regression,  376-383,  386-392 
coefficient,  382,  383,  395,  396 
non-linear,  386-392 

Relative  variability,  graphic  representa- 
tion of,  349-356 

Reliability  of  statistics  of  causes  of  death, 
102-105 

Respiration  rate,  347,  385 
Rietz,  H.  L.,  41,  374 
Riley,  R.  H.,  275 
Rioch,  M.  G.,  162 
Roach,  life  table  for,  256-258 
Robertson,  T.  B.,  417 
Rock,  F.,  385 
Rockwood,  R.,  156,  162 
Roots,  433 
Rose,  W.,  188,  189 
Rossiter,  W.  S.,  43,  44,  61,  105 
Rotifer,  life  table  for,  255-258 
Roullet,  H.,  191,  203 
Royal  Air  Force,  163 
Statistical  Society,  42,  44 
Rubin,  M.,  229 
Running,  T.  R.,  408,  416 
Rural  vs.  urban  mortality,  47,  219,  221 
Russia,  44,  220 


Sampling,  326-333,  337 
limits,  table  of,  330 
Saxony,  43 
Scandinavia,  61,  106 
Scarlet  fever,  323-325 
Scatter  diagrams,  190,  191 
Schiller,  F.  C.  S.,  334 
Schultz,  H.,  428 
Schuster,  E.,  385 
Science,  history  of,  17 
Scientific  data,  collection  of,  121—123 
Scotland,  106,  220,  224,  225 
Seattle,  217,  266-271,  273 
Septicemia,  puerperal,  204,  205 


Serbia,  220 
Sex  ratio,  45 

Shape  constants,  356-359 
Sheppard,  W.  F.,  337,  339,  365,  380 
Shull,  G.  H.,  56 
Sight,  keenness  of,  347 
Significant  difference,  283-287 
Singer,  F.,  280 
Skew  correlation,  386-392 
frequency  curves,  59,  359 
logistic  curve,  427 
Skewness,  356-358 
probable  error  of,  357 
Skull,  variation  in,  348,  349,  380 
Slug,  life  table  for,  256-258 
Smallpox,  385 
illustration,  107,  108 
Smits,  E.,  44 
Snow,  E.  C.,  263,  264 
Societe  de  statistique  de  Paris,  44 
Sociology,  17 
Soper,  H.  E.,  60 
Soreau,  192,  203 
Sorter,  153,  154 
Space  base  of  statistics,  20 
Spain,  43,  106,  220 

Special  theorems  in  probability,  315-334 
Specific  birth-rates,  225,  226 
death-rates,  212-215,  270,  271,  2/3,  274 
Spleen,  enlarged,  320-322 
weight,  347 
Spot  maps,  188-190 
Spurious  correlation,  360 
Standard  deviation,  345,  346,  439 
of  binomial,  309 
probable  error  of,  345 
million,  260-262,  272,  273 
population.  271-274 
Standardized  death-rates,  265-269 
Standards  in  graphic  work,  196-202 
Stationary  populations,  259-262 
Statistical  maps,  188-190 

method  as  description  of  group,  28-31 
defined,  19,  21 
world,  nature  of,  27,  32-35 
Statistics  defined,  19 
on  space  base,  20 
on  time  base,  20 


INDEX 


458 


Stature,  19,  349,  350-354,  359,  385 
Steadiness  of  hand,  347 
Stevenson,  T.  H.  C.,  51,  55 
Still-birth  certificate,  76 
Still-births,  causes  of,  93,  94 
Stirling’s  theorem,  299 
Stockholm,  23 
Stocks,  P.,  249 
Straight  line,  fitting  of,  411 
Streeter,  G.  L.,  389 
Strength  of  grip,  347 
of  pull,  347,  385 
Stuart,  C.  A.  V.,  43 
“Student,”  341,  365 
Sugar  crops,  187,  190 
Suicide,  104,  406 
Sums  of  logarithms,  434 
Sundbarg,  G.,  229 
Surface,  F.  M.,  226,  354,  397 
Survivorship  distributions,  238-245,  247- 
258 

Siissmilch,  J.  P.,  43,  51 
Sutton,  A.  C.,  162 
Sutton,  F.  D.,  9,  332 

Sweden,  43,  220,  224,  225,  245,  246,  316, 
348,  421-426 
Sweeney,  J.  S.,  230,  237 
Swiftness  of  blow,  347 
Switzerland,  44,  106,  220,  224,  225 
Symptomatology  of  epidemic  jaundice, 
117-119,  168,  169 
Szabo,  I.,  256 
Szabo,  M.,  256 

Table  of  sampling  limits,  330 
Tables,  arrangement  of,  116-119 
double-dichotomous,  112-115 
Tabular  presentation,  107-120 

review  of  history  of  statistics,  42-45 
Tabulation,  exclusiveness  in,  114,  115 
mechanical,  44,  145,  152-160,  162,  163 
purpose  of,  107 
Tabulator,  155,  156 
Tatham,  J.,  55,  99 
Tebb,  A.  E.,  229,  237 
Technical  terminology,  18 
Teller-bookkeeper  illustration,  22 


Temperature,  oral,  349,  385 
Terms  of  binomial,  303-310,  331-333 
Test,  chi-square,  315-326 
Tewksbury,  R.  B.,  11 
Theology,  51 

Theory  of  probability,  40,  282-287,  288- 
334 

of  statistics  defined,  19 
Therapeutic  problem,  method  of  studying, 
23,  24 

Thiele,  T.  N.,  61 
Thyroid,  variation  of,  347,  348 
Tibia  length,  348 
Tildesley,  M.  L.,  349 
Time  base  of  statistics,  20 
trend  diagrams,  180-190 
Tocher,  J.  F.,  170,  171,  359,  361 
Todd,  T.  W.,  349 
Todhunter,  I.,  44 
Tonsil,  illustration,  132 
Torso  length,  348,  349 
Traumatism,  104 

Tuberculosis,  course  of  death-rate  from, 
181-184,  237 
miliary,  112-114 

Tuberculous,  age  at  death  of  parents  of, 
176 

incidence  of  influenza  among,  108-110 
Twenty-five  per  cent,  reduction,  illustra- 
tion, 182-184 
Type  constants,  340-344 
Types  of  diagrams,  165,  166 
Typhoid  fever,  19,  20,  131,  180-184,  221, 
222 

Umanski,  A.  J.  V.,  193,  194 
Uncinariasis,  188,  227,  326-329 
United  States,  43,  44,  72,  100,  167,  170, 
181,  213,  214,  219,  220,  231,  232,  234, 
241-246,  248,  261,  262,  263,  272,  294, 
365,  376,  402-406,  427,  428 
Urban  vs.  rural  mortality,  47,  219,  221 
Uruguay,  220,  221 

Vaccination,  385 

Variability,  graphic  representation  of  rela- 
tive, 349-356 


INDEX 


459 


Variable,  constants  of  compound,  359-361 
Variation  in  man,  347-349 
measurement  of,  335-365 
Venn,  J.,  313,  365 
Verhulst,  P.  F.,  44,  61,  417,  427 
Visual  acuity,  347 
Vital  capacity,  347 
index,  229-236 
statistics  defined,  21 
history  of,  42-62 
von  Huhn,  R.,  179,  202 

Walker,  H.  M.,  43,  61 
Watkins,  G.  P.,  120 
Weight  of  embryo,  389-392,  410-415 
of  heart,  347,  385 
of  infants,  385 
of  liver,  347,  348,  385 
Welch,  W.  H.,  Dedication 


Weldon,  W.  F.  R.,  44,  58 

Wells,  T.  S.,  145 

Wernicke,  J.,  229 

Wheat,  167,  168 

Whipple,  G.  C.,  184,  185,  203 

Whiteley,  M.  A.,  385 

Whiting,  M.  H.,  335,  385 

Whooping-cough,  186,  187 

“Who’s  Who,”  example  from,  279-282 

Wicksell,  S.  D.,  61 

Williams,  H.,  117 

Wolff,  G.,  41 

Wood,  F.,  385 

Wright,  A.,  22 

Wurzburger,  E.,  43,  44 


Yule,  G.  U.,  8,  11,  19,  40,  44,  59  (por- 
trait), 61,  120,  125,  162,  287,  313,  365, 
379,  393,  395,  406,  427 


( 


